柯蒂斯路
Curtis Roads
电脑音乐教程
The Computer Music Tutorial
第二版
SECOND EDITION
麻省理工学院出版社
The MIT Press
马萨诸塞州剑桥
Cambridge, Massachusetts
英国伦敦
London, England
© 2023 麻省理工学院
© 2023 Massachusetts Institute of Technology
保留所有权利。未经出版商书面许可,不得以任何电子或机械方式(包括影印、录制或信息存储和检索)复制本书的任何部分。
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.
麻省理工学院出版社谨此感谢为本书草稿提供意见的匿名同行评审专家。学术专家的慷慨贡献对于我们出版物的权威性和质量至关重要。我们衷心感谢这些未署名读者的贡献。
The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.
美国国会图书馆出版编目数据
Library of Congress Cataloging-in-Publication Data
姓名:Roads,Curtis,作家。
Names: Roads, Curtis, author.
标题:计算机音乐教程/柯蒂斯·罗兹 (Curtis Roads)。
Title: The computer music tutorial / Curtis Roads.
描述:第二版。| 马萨诸塞州剑桥:麻省理工学院出版社,2023 年。| 包括参考书目和索引。
Description: Second edition. | Cambridge, Massachusetts : The MIT Press, 2023. | Includes bibliographical references and index.
标识符:LCCN 2022003353(打印)| LCCN 2022003354(电子书)| ISBN 9780262044912(精装本)| ISBN 9780262361545 (epub) | ISBN 9780262361538 (pdf)
Identifiers: LCCN 2022003353 (print) | LCCN 2022003354 (ebook) | ISBN 9780262044912 (hardcover) | ISBN 9780262361545 (epub) | ISBN 9780262361538 (pdf)
主题:LCSH:计算机音乐——教学与学习。| 计算机声音处理。| 软件合成器。| 计算机作曲(音乐)
Subjects: LCSH: Computer music—Instruction and study. | Computer sound processing. | Software synthesizers. | Computer composition (Music)
分类:LCC MT56 .R6 2023(印刷版)| LCC MT56(电子书)| DDC 780.285—dc23
Classification: LCC MT56 .R6 2023 (print) | LCC MT56 (ebook) | DDC 780.285—dc23
LC 记录可在https://
LC record available at https://
LC 电子书记录可在https://
LC ebook record available at https://
d_r0
d_r0
出版物由以下机构资助
publication supported by a grant from
大纽黑文社区基金会
The Community Foundation for Greater New Haven
作为城市避风港项目的一部分
as part of the Urban Haven Project
图表列表
List of Figures
图 1.1 1900 年以前的机械录音会议。钢琴上方的大锥体拾取的声音振动被转换成切割针刺穿旋转蜡筒的振动。
Figure 1.1 Mechanical recording session before 1900. Sound vibrations picked up by the large cone over the piano were transduced into vibrations of a cutting stylus piercing a rotating wax cylinder.
图 1.2 Amplion 扬声器,1925 年广告宣传。
Figure 1.2 Amplion loudspeaker, as advertised in 1925.
图 1.3 AEG 公司于 1935 年制造的便携式磁带录音机 原型。(图片由 BASF Aktiengesellschaft 提供。)
Figure 1.3 Prototype of a portable magnetophon tape recorder from 1935, made by AEG. (Photograph courtesy of BASF Aktiengesellschaft.)
图 1.4 放大显示声音编辑器中的各个样本。声音编辑器在样本间画了一条线以增强显示效果。所有样本的振幅均为正值;中间的线表示振幅为 0。显示的时间跨度约为 700 µs(小于千分之一秒)。
Figure 1.4 Zoomed in to show individual samples as they appear in a sound editor. The sound editor drew a line through them to enhance the display. The samples are all positive in amplitude; the line in the center represents 0 amplitude. The time span shown is about 700 µs (less than a thousandth of a second).
图 1.5 日本哥伦比亚(Denon)于 1973 年制造的数字录音机,基于 1 英寸录像机(右)。
Figure 1.5 Nippon Columbia (Denon) digital audio recorder made in 1973, based on a 1-inch videotape recorder (right).
图 1.6 3M 32 轨数字磁带录音机,于 1978 年推出。
Figure 1.6 3M 32-track digital tape recorder, introduced in 1978.
图 1.7 Studer D820-48 DASH 数字多轨录音机,于 1991 年推出,零售价约为 270,000 美元。制作磁带备份需要使用两台机器。
Figure 1.7 Studer D820-48 DASH digital multitrack recorder, introduced in 1991 with a retail price of about $270,000. To make a backup copy of the tape required the use of two machines.
图1.8 索尼PCM-D100现场录音机。
Figure 1.8 Sony PCM-D100 field recorder.
图 2.1 信号的时域表示。垂直方向表示气压。曲线越靠近图表顶部,气压越大。低于水平实线,气压越低。以声音形式听到的气压变化可能很快发生;对于音乐声来说,整个图表的持续时间可能不超过千分之一秒(1 毫秒)。
Figure 2.1 Time-domain representation of a signal. The vertical dimension shows the air pressure. When the curved line is near the top of the graph, the air pressure is greater. Below the solid horizontal line, the air pressure is reduced. Atmospheric pressure variations heard as sound can occur quickly; for musical sounds, this entire graph might last no more than one-thousandth of a second (1 ms).
图 2.2 四个信号的时域和频域表示。(a)正弦波一个周期的时域视图。(b)正弦波中一个频率分量的频谱。(c)锯齿波一个周期的时域视图。(d)显示锯齿波频率内容呈指数下降的频谱。(e)复杂波形一个周期的时域视图。虽然波形看起来很复杂,但当它一遍又一遍重复时,它的声音实际上很简单——就像薄簧风琴的声音。(f)波形(e)的频谱显示它主要由几个频率组成。(g)随机噪声波形。(h)如果波形不断变化(每个周期都与上一个周期不同),那么我们听到的是噪声。噪声的频率内容非常复杂。在这种情况下,分析提取了 252 个频率。此快照并未显示它们的幅度如何随时间不断变化。
Figure 2.2 Time-domain and frequency-domain representations of four signals. (a) Time-domain view of one cycle of a sine wave. (b) Spectrum of the one frequency component in a sine wave. (c) Time-domain view of one cycle of a sawtooth waveform. (d) Spectrum showing the exponentially decreasing frequency content of a sawtooth wave. (e) Time-domain view of one cycle of a complex waveform. Although the waveform looks complex, when it is repeated over and over its sound is actually simple—like a thin reed organ sound. (f) The spectrum of waveform (e) shows that it is dominated by a few frequencies. (g) A random noise waveform. (h) If the waveform is constantly changing (each cycle is different from the last cycle), then we hear noise. The frequency content of noise is very complex. In this case the analysis extracted 252 frequencies. This snapshot does not reveal how their amplitudes are constantly changing over time.
图 2.3 正弦波形相当于延迟的余弦波形,因此相移略有增加。
Figure 2.3 A sine waveform is equivalent to a cosine waveform that has been delayed and hence phase shifted slightly.
图 2.4 相位反转的效果。(b) 是 (a) 的相位反转版本。如果将两个波形相加,(c) 则其和为零。
Figure 2.4 The effects of phase inversion. (b) is a phase-inverted copy of (a). If the two waveforms are added together, (c) they sum to zero.
图 2.5 各种声源的典型声功率级。所有值均以 0 dB 为基准。
Figure 2.5 Typical acoustic power levels for various acoustic sources. All values are relative to 0 dB.
图 3.1 模拟音频播放链,从唱片凹槽转换的模拟波形开始,到发送到前置放大器、放大器和扬声器并投射到空中的电压。
Figure 3.1 The analog audio playback chain, starting from an analog waveform transduced from the grooves of a phonograph record to a voltage sent to a preamplifier, amplifier, and loudspeaker and projected into the air.
图 3.2 数字录音和回放概览。
Figure 3.2 Overview of digital recording and playback.
图 3.3 信号的模拟和数字表示。(a) 模拟正弦波形。波形下方的水平线表示一个周期或循环。(b) 图 (a) 中正弦波形的采样版本,它可能出现在 ADC 的输出端。每个垂直线代表一个样本。每个样本都以一个数字的形式存储在内存中,该数字代表垂直线的高度。(c) 图 (b) 中波形采样版本的重构。实际上,这些样本通过低通平滑滤波器连接起来,形成最终到达听者耳朵的波形。
Figure 3.3 Analog and digital representations of a signal. (a) Analog sine waveform. The horizontal bar below the wave indicates one period or cycle. (b) Sampled version of the sine waveform in (a), as it might appear at the output of an ADC. Each vertical bar represents one sample. Each sample is stored in memory as a number that represents the height of the vertical bar. (c) Reconstruction of the sampled version of the waveform in (b). In effect, the samples are connected by the lowpass smoothing filter to form the waveform that eventually reaches the listener’s ear.
图 3.4 采样问题。(a)待记录的波形。(b)采样脉冲;每当发生一个采样脉冲时,就会进行一次采样。(c)采样并存储在内存中的波形。(d)当(c)中的波形发送到 DAC 时,输出可能如下所示(Mathews 1969 年之后)。
Figure 3.4 Problems in sampling. (a) Waveform to be recorded. (b) The sampling pulses; whenever a sampling pulse occurs, one sample is taken. (c) The waveform as sampled and stored in memory. (d) When the waveform from (c) is sent to the DAC, the output might appear as shown here (after Mathews 1969).
图 3.5 混叠效应。在每组三个图表的底部,粗黑点代表样本,虚线表示 DAC 重建的信号。在 (b) 中,正弦波形 (a) 的每个周期被采样八次。使用相同的采样频率,在 (e) 中,(d) 的每个周期仅被采样两次。如果 (e) 中的采样脉冲向右移动,则 (f) 中的输出波形可能会发生相移,尽管输出频率仍然相同。在 (h) 中,(g) 中的十一个周期有十个样本。当 DAC 尝试重建信号时,如 (i) 中的虚线所示,会得到正弦波形,但由于折叠效应,频率已完全改变。请注意 (g) 上方的水平双箭头,它表示输入波形的一个周期,以及 (i) 上方的箭头,它表示输出波形的一个周期。
Figure 3.5 Aliasing effects. At the bottom of each set of three graphs, the thick black dots represent samples, and the dotted line shows the signal as reconstructed by the DAC. Every cycle of the sine waveform (a) is sampled eight times in (b). Using the same sampling frequency, each cycle of (d) is sampled only twice in (e). If the sampling pulses in (e) were moved to the right, the output waveform in (f) might be phase-shifted, although the frequency of the output would still be the same. In (h), there are ten samples for the eleven cycles in (g). When the DAC tries to reconstruct a signal, as shown by the dashed lines in (i), a sine waveform results, but the frequency has been completely changed due to the foldover effect. Notice the horizontal double arrow above (g), indicating one cycle of the input waveform, and the arrow above (i), indicating one cycle of the output waveform.
图 3.6 当输入频率超过奈奎斯特频率时,记录的信号会折叠并向下进行。
Figure 3.6 When the input frequency exceeds the Nyquist frequency, the recorded signal folds over and proceeds downward.
图 3.7 RME Fireface UFX +音频接口 的前后面板。前面板配有四个麦克风/线路前置放大器和 MIDI 接口。支持 12 个模拟输入和输出。后面板布满了各种接口,包括光纤 MADI 多通道接口。该接口总共可处理 96 个输入和输出,并通过 USB 3 或 Thunderbolt 连接到计算机。
Figure 3.7 Front and back panel of the RME Fireface UFX+ audio interface. The front panel shows four mic/line preamplifiers and MIDI jacks. Twelve analog inputs and outputs are supported. The rear panel bristles with connectors, including optical MADI multichannel jacks. The interface processes a total of ninety-six inputs and outputs and connects to a computer via USB 3 or Thunderbolt.
图 3.8模拟脉冲(左)的 脉冲响应(从左到右),分别由 48 kHz、96 kHz、192 kHz 和直接数字流 (DSD) 录音系统录制。DSD 将在第 4 章中讨论。
Figure 3.8 Impulse responses (left to right) of an analog impulse (left) as recorded by 48 kHz, 96 kHz, 192 kHz, and direct digital stream (DSD) recording systems. DSD is discussed in chapter 4.
图 4.1 幅度测量。(1)峰值幅度。(2)峰峰值幅度。(3)均方根幅度。
Figure 4.1 Measures of amplitude. (1) Peak amplitude. (2) Peak-to-peak amplitude. (3) RMS amplitude.
图 4.2 量化效应。(a) 模拟波形。(b) (a) 中波形的采样版本。每个样本只能分配特定的值,这些值由左侧的短水平虚线表示。(c) 显示了每个样本与原始信号之间的差异,其中每个条形的高度代表量化误差。
Figure 4.2 Effects of quantization. (a) Analog waveform. (b) Sampled version of the waveform in (a). Each sample can be assigned only certain values, which are indicated by the short horizontal dashes at the left. The difference between each sample and the original signal is shown in (c), where the height of each bar represents the quantization error.
图 4.3 比较 4 位量化与 1 位量化的精度。细圆曲线为输入波形。(a) 1 位量化提供两种级别的幅度分辨率。(b) 4 位量化提供十六种不同级别的幅度分辨率。
Figure 4.3 Comparing the accuracy of 4-bit quantization with that of 1-bit quantization. The thin rounded curve is the input waveform. (a) 1-bit quantization provides two levels of amplitude resolution. (b) 4-bit quantization provides sixteen different levels of amplitude resolution.
图 4.4 量化对正弦波平滑度的影响。(a)具有十个量化级别的正弦波,对应于 4 位系统发出的中等音量音调。(b)8 位系统发出的更平滑的正弦波。
Figure 4.4 Effect of quantization on sine wave smoothness. (a) Sine wave with ten levels of quantization, corresponding to a moderately loud tone emitted by a 4-bit system. (b) Smoother sinusoid emitted by an 8-bit system.
图 4.5 采样网格。横轴表示时间,纵轴表示振幅。(a) 低采样率和低量化导致的正弦波形的粗略近似。(b) 提高网格分辨率可以更好地近似波形。网格分辨率越高,波形越接近原始波形。
Figure 4.5 The sampling grid. The horizontal axis is time. The vertical axis is amplitude. (a) Crude approximation of a sine waveform caused by low sampling rate and low quantization. (b) Increasing grid resolution results in a better approximation to the waveform. Greater increases in grid resolution would closely approximate the original waveform.
图 4.6 抖动减少谐波失真。(顶部)原始信号。(底部)抖动后信号。
Figure 4.6 Dither reduces harmonic distortion. (Top) Original signal. (Bottom) Postdithered signal.
图 5.1 Max V. Mathews,1981 年。
Figure 5.1 Max V. Mathews, 1981.
图 5.2 IBM 704 计算机,1957 年。
Figure 5.2 IBM 704 computer, 1957.
图 5.3 IBM 704 计算机的真空管逻辑模块。
Figure 5.3 Vacuum tube logic module for the IBM 704 computer.
图 5.4贝尔电话实验室于 1961 年出版的《数学音乐》黑胶唱片 的作者副本封面
Figure 5.4 Cover of the author’s copy of Music from Mathematics vinyl record published by Bell Telephone Laboratories in 1961.
图 5.5 Music V 合成音色。第 1-7 行定义了乐器,其中包含一个包络发生器。包络 F1 在第 8 行的 GEN 语句中定义。一个低频正弦波振荡器调制另一个振荡器上的颤音(Mathews 1969)。
Figure 5.5 Music V synthesis patch. Lines 1–7 define the instrument, which features an envelope generator. The envelope F1 is defined in the GEN statement in line 8. A low-frequency sine wave oscillator modulates vibrato on another oscillator (Mathews 1969).
图 6.1 波表查找合成的图形描述。下半部分中的列表 0–24 包含编号位置或表格索引值。每个索引点的音频样本值存储在内存中。样本在上半部分以矩形框出正弦波的轮廓来表示。例如,Wavetable[0] = 0,Wavetable[6] = 1。为了合成正弦波,计算机查找存储在连续索引位置的样本值,并将它们发送到 DAC,如此反复循环遍历整个表格。
Figure 6.1 Graphical depiction of wavetable lookup synthesis. The list 0–24 in the lower portion contains numbered locations or table index values. An audio sample value is stored in memory for each index point. The samples are depicted as the rectangles outlining a sine wave in the top portion. For example, Wavetable[0] = 0, and Wavetable[6] = 1. To synthesize the sine wave, the computer looks up the sample values stored in successive index locations and sends them to a DAC, looping through the table repetitively.
图 6.2 相位增量或相量(斜坡函数)从 0 变为N两次,从而产生两个正弦波周期。
Figure 6.2 The phase increment or phasor (a ramp function) goes from 0 to N two times, creating two cycles of the sine wave.
图 6.3 插值振荡器的动作。该图显示了波表中两个x点,分别位于位置 27 和 28。振荡器相位增量指示该值应为 27.5。插值振荡器使用线性插值算法生成介于 27 和 28 之间的y值。
Figure 6.3 Action of an interpolating oscillator. The graph shows two x points in a wavetable, at positions 27 and 28. The oscillator phase increment indicates that the value should be 27.5. The interpolating oscillator generates a y value between 27 and 28 using a linear interpolation algorithm.
图 7.1 具有声音波形输入f 1 且幅度和频率参数固定的振荡器。
Figure 7.1 Oscillator with sound waveform input f 1 and fixed parameters for amplitude and frequency.
图 7.2 启动、衰减、维持、释放 (ADSR) 幅度包络。
Figure 7.2 Attack, decay, sustain, release (ADSR) amplitude envelope.
图 7.3 振幅包络为f 1 、声音波形为f 2 的振荡器。
Figure 7.3 Oscillator with amplitude envelope f 1 and sound waveform f 2.
图 8.1 作者打印的 1974 年 Music V 源代码摘录。该代码是经典的 FORTRAN IV 语言,程序控制流由 GOTO 语句控制。前 16 行是 OUT 单元生成器的结尾。其余部分是 OSCIL 单元生成器的开头。适用于 GFortran 和 Linux 的更现代版本的 Music V 代码发表于 Boulanger 和 Lazzarini (2011)。GFortran 仍在维护中。
Figure 8.1 Excerpt of the author’s printout of the source code of Music V from 1974. The code is classic FORTRAN IV, with program control flow by GOTO statements. The first 16 lines are the end of the OUT unit generator. The rest is the beginning of the OSCIL unit generator. The code for a more modern version of Music V for GFortran and Linux is printed in Boulanger and Lazzarini (2011). GFortran continues to be maintained.
图 8.2 Music V 中的简单管弦乐队和乐谱(Mathews 1969 年版)。为了简化解释,添加了行号 1-17。代码中的 1-4 行定义了 1 号乐器(一个振荡器和一个输出)。第 5 行生成函数 F2。第 6-16 行列出了定义乐谱的音符语句。每个音符语句都给出了起始时间、乐器编号、时值、振幅和音高(在本例中已编码)。第 17 行终止程序。
Figure 8.2 A simple orchestra and score in Music V (after Mathews 1969). Line numbers 1–17 have been added to simplify the explanation. Lines 1–4 in the code define instrument number 1 (an oscillator and an output). Line 5 generates the function F2. Lines 6–16 list note statements defining the score. Each note statement gives starting time, instrument number, duration, amplitude, and pitch (encoded in this case). Line 17 terminates the program.
图 8.3 Native Instruments FM8。该软件合成器针对频率调制合成进行了优化。
Figure 8.3 Native Instruments FM8. This software synthesizer is optimized for frequency modulation synthesis.
图 8.4 Absynth 补丁窗口。
Figure 8.4 Absynth patch window.
图 8.5 Madrona Labs Aalto,一款可修补的软件合成器。
Figure 8.5 Madrona Labs Aalto, a patchable software synthesizer.
图 8.6 Unfiltered Audio LION,一个可修补的软件合成器。
Figure 8.6 Unfiltered Audio LION, a patchable software synthesizer.
图 8.7 Arturia EMS Synthi V 软件合成器。
Figure 8.7 Arturia EMS Synthi V software synthesizer.
图 8.8 频率调制的最大补丁。
Figure 8.8 Max patch for frequency modulation.
图 8.9 用于频率调制合成的 SuperCollider 代码。
Figure 8.9 SuperCollider code for frequency modulation synthesis.
图 8.10 实时合成系统概览。
Figure 8.10 Overview of a real-time synthesis system.
图 8.11 音频回调循环读取波表并将音频输出到 DAC。感谢 Rodney Duplessis 的贡献。
Figure 8.11 Audio callback loop that reads a wavetable and outputs audio to the DAC. Credit to Rodney Duplessis.
图 9.1 Native Instrument Kontakt,一款流行的软件采样器。在这个简单的设置中,采样器中加载了六个声音文件(右侧)。所有六个文件均由 MIDI 键盘同时触发。
Figure 9.1 Native Instrument Kontakt, a popular software sampler. In this simple setup, six sound files are loaded in the sampler (right side). All six are triggered simultaneously by a MIDI keyboard.
图9.2 皮埃尔·谢弗位于巴黎大学路的具体音乐工作室, 1960年。工作室里有三台录音机(左)和一台唱机。右边是另一台录音机和一台多磁头唱机(见图9.3)。(图片由巴黎音乐研究小组提供。)
Figure 9.2 Pierre Schaeffer’s studio for musique concrète at rue de l’Université, Paris, 1960. The studio features three tape recorders (left) along with a disk turntable. On the right is another tape recorder and the multiple-head Phonogène device (see figure 9.3). (Photograph courtesy of the Groupe de Recherches Musicales, Paris.)
图9.3皮埃尔·谢弗正在弹奏Phonog è ne 的键盘。Phonog è ne是一款磁带转调器和时间拉伸器,1953年,巴黎。(图片由Lido拍摄,由Groupe de Recherches Musicales提供。)
Figure 9.3 Pierre Schaeffer playing the keyboard of the Phonogène, a tape-based transposer and time stretcher, 1953, Paris. (Photograph by Lido, supplied by the courtesy of the Groupe de Recherches Musicales.)
图 9.4 Mellotron 型号 400(1970 年)。
Figure 9.4 The Mellotron model 400 (1970).
图 9.5 E-mu Emulator 采样键盘乐器(1981 年)。
Figure 9.5 The E-mu Emulator sampling keyboard instrument (1981).
图 9.6 具有特征性 ADSR 振幅包络的声音的振幅轮廓。平滑循环的最佳区域是持续部分。
Figure 9.6 Amplitude profile of a sound with a characteristic ADSR amplitude envelope. The best area for a smooth loop is the sustained portion.
图 9.7 基本音高周期等于周期波形的一个周期,在本例中,该波形是由中音萨克斯管发出的波形。
Figure 9.7 The fundamental pitch period is equal to one cycle of a periodic waveform, in this case, a waveform emitted by an alto saxophone.
图 9.8 拼接循环与交叉淡入淡出循环。(a)波形的两个部分在公共零点处进行对接拼接。循环的结束点与同一波表循环的开始点拼接。(b)交叉淡入淡出循环可以看作是循环结束的淡出与循环开始的淡入重叠。
Figure 9.8 Splicing versus crossfading loops. (a) A butt splice of two parts of a waveform at a common zero point. The ending point of the loop splices to the beginning of the same wavetable loop. (b) Crossfade looping can be viewed as a fade out of the end of the loop overlapped by a fade in of the beginning of the loop.
图 9.9 用于平滑变化的循环方法。(a)双向循环的三个周期。(b)在分层的前向/后向循环中,将两个版本相加。
Figure 9.9 Looping methods for smoothing out variations. (a) Three cycles of a bidirectional loop. (b) In a layered forward/backward loop the two versions are added together.
图 9.10 在固定播放采样频率下,通过采样率转换实现音高偏移。上图:如果在播放时跳过每个样本,信号会被抽取,音高会升高一个八度。下图:如果在播放时通过插值方法使用两倍的样本数量,信号会降低一个八度。
Figure 9.10 Pitch-shifting by sample-rate conversion with a constant playback sampling frequency. Top: If every other sample is skipped on playback, the signal is decimated, and the pitch is shifted up an octave. Bottom: If twice the number of samples are used by means of interpolation on playback, the signal is shifted down an octave.
图 9.11 经过足够的抽取,即使是正弦波也可以变成锯齿波形。(a)原始正弦波形。(b)将(a)中的波形抽取 8 倍。
Figure 9.11 With enough decimation, even a sine wave can be turned into a jagged waveform. (a) Original sinusoidal waveform. (b) Decimation of (a) by a factor of 8.
图 9.12 小号 用吐音(a)和不用吐音(b)演奏时,上行大三度音程音符间过渡的时域图。绘图时间跨度为 120 毫秒。
Figure 9.12 Time-domain plot of note-to-note transition of an ascending major third interval for a trumpet played tongued (a) and untongued (b). The time span for the plots is 120 ms.
图 9.13 图 9.10 中音符过渡的频谱图。图中绘制了 50 个谐波,时间跨度为 300 毫秒,低次谐波位于后方。(a) 吐舌音。(b) 不吐舌音。注意,当音符过渡为不吐舌音(更连续)时,(a) 中间的“空洞”是如何被填满的。
Figure 9.13 Spectrum plots of the transitions shown in figure 9.10. The plots show fifty harmonics plotted over a time span of 300 ms, with lower harmonics at the back. (a) Tongued. (b) Untongued. Notice how the “hole” in the middle of (a) is filled in when the note transition is untongued (more continuous).
图 10.1 管风琴控制台,键盘两侧均设有音栓。
Figure 10.1 Pipe organ console with register stops on either side of the keyboards.
图 10.2 电传簧风琴中复杂音调的加法合成。来自音调发电机的正弦波谐波被馈送到母线 (54)。按下一个键(在本例中为 C)将每个谐波连接到多线圈变压器 (56 Inductorium),在那里它们进行混频。每个谐波通过与每个绕组串联的电感器 (56a、b 等) 衰减到所需的电平。抽头开关电感器 (60) 调节混频变压器输出的幅度,传输线听众端扬声器附近的电感器 (72、73) 也调节幅度。(Cahill 专利图,转载自 Johnson 等人 [1970]。)
Figure 10.2 Additive synthesis of a complex tone in the Telharmonium. Sine wave harmonics from the tone-generating alternator are fed to bus bars (54). Pressing a key (C in this case) connects each harmonic to a multicoil transformer (56 Inductorium) where they mix. Each harmonic is attenuated to the desired level by the inductors in series with each winding (56a, b, etc.). The tap-switch inductors (60) regulate the amplitude of the mixing transformer output, as do the inductors near the loudspeakers (72, 73) at the listener’s end of the transmission line. (Cahill patent drawing, reproduced in Johnson et al. [1970].)
图 10.3 作者的 Hammond T422 风琴,这是一款基于机电音轮的加法合成乐器。通过拉动琴键上方的拉杆,可以调节各种谐波的不同混合。该乐器内置 Leslie 旋转扬声器。
Figure 10.3 The author’s Hammond T422 organ, an additive synthesis instrument based on electromechanical tone-wheels. Different mixtures of the various harmonics can be adjusted by pulling drawbars above the musical keys. The instrument contains a built-in Leslie rotating speaker.
图 10.4 通过谐波叠加实现的波形合成。(a) 直方图以线性比例显示谐波的相对强度。在这种情况下,直方图仅在奇次谐波中具有能量。三次谐波的幅度是基波的三分之一,五次谐波的幅度是基波的五分之一,依此类推。(b) 近似于 (a) 中通过谐波叠加合成的方波。
Figure 10.4 Waveform synthesis by harmonic addition. (a) Histogram plot showing the relative strength of the harmonics on a linear scale. In this case, the histogram has energy only in the odd harmonics. The amplitude of the third harmonic is one-third that of the fundamental, the amplitude of the fifth harmonic is one-fifth that of the fundamental, and so on. (b) Approximation to a square wave synthesized by harmonic addition in (a).
图 10.5 在一系列时域波形中看到的谐波叠加阶段。(a)仅基波。(b)一次和三次谐波。(c)奇数次谐波至五次谐波之和。(d)奇数次谐波至九次谐波之和。(e)通过将最多 101 次的奇数谐波相加而产生的准方波。
Figure 10.5 Stages of harmonic addition as seen in a series of time domain waveforms. (a) Fundamental only. (b) First and third harmonics. (c) Sum of odd harmonics through the fifth. (d) Sum of odd harmonics through the ninth. (e) Quasi-square wave created by summing odd harmonics up to the 101st.
图 10.6 加法合成中相位的影响。该波形是与图 10.5e 相同的正弦波混合的结果,只是第五次谐波的起始相位为 90 °而不是 0 °。
Figure 10.6 Effect of phase in additive synthesis. This waveform is the result of the same mixture of sine waves as in figure 10.5e except that the starting phase of the fifth harmonic is 90° instead of 0°.
图 10.7 部分加法,包含四个成分,包括谐波和非谐波。每个成分的贡献百分比分别为 73%、18%、5% 和 4%。(a) 频域视图。(b) 时域波形。
Figure 10.7 Partial addition with four components, both harmonic and inharmonic. The percentage contribution of each component is 73, 18, 5, and 4 percent, respectively. (a) Frequency-domain view. (b) Time-domain waveform.
图 10.8 1965 年,Jean-Claude Risset 在贝尔电话实验室演示了通过加法合成产生的小号音调。(照片由 AT&T 提供。)
Figure 10.8 Jean-Claude Risset demonstrates a trumpet tone created by additive synthesis at Bell Telephone Laboratories, 1965. (Photograph courtesy of AT&T.)
图 10.9 小号 十二个泛音的时变频谱图,其中最高的泛音位于最前面。时间从左到右。注意,基音(后音)的振幅并非最高,但它持续时间最长。
Figure 10.9 Time-varying spectrum plot of twelve partials of a trumpet tone, with the highest partials in the foreground. Time goes from left to right. Notice that the fundamental (back) is not the highest amplitude, but it lasts the longest.
图 10.10 模拟域中的加法合成。振荡器为混频器供电(底部)。
Figure 10.10 Additive synthesis in the analog domain. Oscillators feed a mixer (bottom).
图 10.11 具有独立频率(F)和幅度(A)包络的数字时变加法合成。
Figure 10.11 Digital time-varying additive synthesis with separate frequency (F) and amplitude (A) envelopes.
图 10.12 Native Instruments Razor 加法合成器的主屏幕。
Figure 10.12 Main screen of the Native Instruments Razor additive synthesizer.
图 10.13 分析/重新合成的总体概述。修改阶段可能涉及对分析数据的手动编辑或通过交叉合成进行修改,其中一个声音的分析数据会缩放另一个声音的分析数据。
Figure 10.13 General overview of analysis/resynthesis. The modification stage may involve manual edits to the analysis data or modifications via cross-synthesis where the analysis data of one sound scales the analysis data from another sound.
图 10.14 加法分析/合成。加窗输入信号经滤波器组分析,生成一组频率(F)和幅度(A)包络或控制函数,用于驱动一组振荡器。如果分析数据不变,输出信号应与输入信号几乎相同。
Figure 10.14 Additive analysis/synthesis. A windowed input signal is analyzed by a filter bank into a set of frequency (F) and amplitude (A) envelopes or control functions that drive a set of oscillators. If the analysis data is not changed, the output signal should be almost the same as the input signal.
图 10.15 加法合成分析数据的大幅缩减。振幅垂直绘制,频率从后向前,时间从左向右。(a)小提琴音调的原始频谱图。(b)与(a)中相同的小提琴音调,每个分音仅用三条线段近似。
Figure 10.15 Drastic data reduction of analysis data for additive synthesis. Amplitude is plotted vertically, frequency goes from back to front, and time goes left to right. (a) Original spectrum plot of a violin tone. (b) The same violin tone as in (a), approximated with only three line segments per partial.
图 10.16 显示意大利语单词prego的时间域波形(顶部)及其在 4 kHz 范围内跟踪的部分(底部)。
Figure 10.16 Display showing a time-domain waveform (top) of the Italian word prego and its tracked partials (bottom) in a range to 4 kHz.
图 10.17 频谱建模合成概览。输入信号被分为确定性部分和随机性部分。每个部分可以在重新合成之前单独修改。(有关分析阶段的更详细视图,请参见图 37.17。)
Figure 10.17 Overview of spectrum modeling synthesis. The input signal is divided into a deterministic part and a stochastic part. Each part can be modified separately before resynthesis. (See figure 37.17 for a more detailed view of the analysis stage.)
图 10.18 前八个 Walsh 函数,0(顶部)到 7(底部)。
Figure 10.18 The first eight Walsh functions, 0 (top) to 7 (bottom).
图 10.19 沃尔什函数求和。(a)通过添加(b)中所示的沃尔什函数构建的简单正弦波近似值。(根据 Tempelaars [1977] 提出。)
Figure 10.19 Walsh function summation. (a) A simple sine wave approximation built by adding the Walsh functions shown in (b). (After Tempelaars [1977].)
图 11.1 波表交叉淡入淡出。粗体轮廓描绘了音符事件的振幅。四个波形在事件的整个跨度内进行交叉淡入淡出。底部的数字表示波形的单独序列和组合序列。底部指示的每个区域代表一个独立的音色;因此,该事件在七个音色之间进行交叉淡入淡出。
Figure 11.1 Wavetable cross-fading. The bold outline traces the amplitude of a note event. Four waveforms cross-fade over the span of the event. The numbers at the bottom indicate the sequence of waveforms alone and in combination. Each region indicated at the bottom represents a separate timbre; thus the event cross-fades through seven timbres.
图 11.2 使用四个波表的波表交叉淡入淡出(矢量合成)乐器。右侧的每个包络都应用于左侧的一个波表。
Figure 11.2 Wavetable cross-fading (vector synthesis) instrument using four wavetables. Each envelope on the right applies to a wavetable on the left.
图 11.3 Korg Wavestation 矢量合成器的软件版本。
Figure 11.3 Korg Wavestation vector synthesizer in its software incarnation.
图 11.4 Korg Wavestate 合成器。
Figure 11.4 Korg Wavestate synthesizer.
图 11.5 活塞本田变形波表振荡器。
Figure 11.5 Piston Honda morphing wavetable oscillator.
图 11.6 波表叠加。四个振荡器的信号叠加在一起。注意,波表包含的不是简单的周期函数,而是长采样的声音。
Figure 11.6 Wavetable stacking. The signals from four oscillators are added together. Notice that the wavetables contain not simple periodic functions but long sampled sounds.
图 12.1 波形地形是一个三维表面,地形的高度( z轴)表示波形值。
Figure 12.1 A waveform terrain is a three-dimensional surface. The height (z-axis) of the terrain represents the waveform value.
图 12.2 用数学函数定义的波浪地形。James (2005) 著。
Figure 12.2 Wave terrains defined by mathematical functions. After James (2005).
图 12.2 (续)
Figure 12.2 (continued)
图 12.3 椭圆轨迹及其产生的信号。(a)轨迹图。x和y维度的变化范围均为-1至+1。(源自 Borgonovo 和 Haus [1986]。)(b)椭圆轨迹在方程 1 定义的波状地形上产生的波形。(注:此波形是根据 Borgonovo 和 Haus [1986] 重新绘制的近似值。)
Figure 12.3 Elliptical trajectory and resulting signal. (a) Plot of the trajectory. Both the x and y dimensions vary from −1 to +1. (After Borgonovo and Haus [1986].) (b) Waveform generated by the elliptical trajectory over the wave terrain defined in equation 1. (Note: This waveform is an approximation redrawn from Borgonovo and Haus [1986].)
图 12.4 环形轨迹。(根据 Borgonovo 和 Haus [1986] 计算)
Figure 12.4 Looping trajectory. (After Borgonovo and Haus [1986].)
图 12.5 非周期性轨迹及其产生的信号。(上图)八次穿越波浪地形的轨迹图。(下图)注意随时间变化的波形。(据 Mitsuhashi [1982a] 所述。)
Figure 12.5 Aperiodic trajectory and resulting signal. (Top) Plot of trajectory trajectories in eight passes through the wave terrain. (Bottom) Notice the time-varying waveform. (After Mitsuhashi [1982a].)
图 12.6使用 Max 中 Jitter 库的简单 WT 合成器(James [2005] 之后)。本例中使用的 WT 是下图所示的编织篮。所需的只是一个带有插值例程的查表程序,这些程序由jit.peek~对象 执行。
Figure 12.6 A simple WT synthesizer using the Jitter library in Max (after James [2005]). In this case, the WT used is the woven basket pictured at the bottom. All that is required is a table lookup with an interpolation routine, procedures that are performed by the jit.peek~ object.
图 12.7 Aaron Anderson 的 WaveTerrain Synth。这里三条小轨迹扫描复杂地形的表面。
Figure 12.7 WaveTerrain Synth by Aaron Anderson. Here three small trajectories scan the surface of a complex terrain.
图 13.1 谷物肖像。音频波形由短持续时间的振幅包络线塑造。该包络线呈钟形,具有平滑的起音和衰减。
Figure 13.1 Portrait of a grain. An audio waveform is shaped by a short-duration amplitude envelope. The envelope is bell shaped with a smooth attack and decay.
图 13.2 Gabor 矩阵。时频网格一小部分的细节(源自 Gabor [1946])。矩阵中的每个矩形内都包含一个时频能量的基本颗粒。阴影表示强度。与传统的声像图不同,Gabor 矩阵中时间轴垂直绘制,频率轴水平绘制。
Figure 13.2 Gabor matrix. A detail of a small portion of a time-frequency grid (from Gabor [1946]). Inside each rectangle on the matrix is an elementary grain of time-frequency energy. Shading indicates intensity. Unlike in a traditional sonogram, time is plotted vertically, and frequency is plotted horizontally.
图 13.3 由包络发生器和具有四通道空间输出的振荡器构建的简单颗粒合成仪器。
Figure 13.3 Simple granular synthesis instrument built from an envelope generator and an oscillator with four-channel spatial output.
图 13.4 颗粒包络。(a)高斯包络。(b)Tukey 包络。(c)三阶段线性包络。(d)两阶段三角形包络。(e)带限脉冲包络。(f)指数衰减包络(expodec)。(g)逆指数衰减包络(rexpodec)。
Figure 13.4 Grain envelopes. (a) Gaussian. (b) Tukey. (c) Three-stage linear. (d) Two-stage triangle. (e) Band-limited impulse. (f) Exponentially decaying (expodec). (g) Reverse exponentially decaying (rexpodec).
图 13.5 三个基本信号的时域函数(上)和频谱(下)(Blauert [1983] 后)。(a)对应于频谱中单条线的无限长正弦波。(b)高斯颗粒和相应的共振峰频谱。(c)理想脉冲和相应的无限长频谱。
Figure 13.5 Time-domain functions (top) and spectra (bottom) of three elementary signals (after Blauert [1983]). (a) Sine wave of infinite duration corresponding to a single line in the spectrum. (b) Gaussian grain and corresponding formant spectrum. (c) Ideal impulse and corresponding infinite spectrum.
图 13.6 颗粒持续时间对频谱的影响。(a)频率为 500 Hz 的 100 毫秒正弦颗粒颗粒流的声图。(b)如果我们将颗粒持续时间缩短到 1 毫秒,频谱将急剧爆炸成宽带噪声。
Figure 13.6 Effect of grain duration on spectrum. (a) Sonogram of a granular stream of 100 ms sinusoidal grains with a frequency of 500 Hz. (b) If we shrink the grain duration to 1 ms, the spectrum explodes dramatically into broadband noise.
图 13.7 采样声音的粒度。样本读取指针是粒度生成器在文件中开始读取样本的位置。(a)如果位置不变,则复制单个粒度。(b)如果指针随机跳动,则输出将被打乱。(c)在多文件粒度生成器中,纹理可以从三个不同的声音文件 A、B 和 C 的组合中显现出来。
Figure 13.7 Granulation of sampled sound. The sample read pointer is the location in the file where the granulator starts reading a sample. (a) If the position does not change, a single grain is replicated. (b) If the pointer jumps around randomly, the output is scrambled. (c) In a multifile granulator, a texture can emerge from a combination of three different sound files, A, B, and C.
图 13.8 时间轴上四股颗粒流的比较。左侧一列中,圆圈代表一粒颗粒。四股流的密度相同。(a)同步。(b)异步。(c)同步间歇。(d)异步间歇。右侧一列表示颗粒数量。间歇会导致颗粒损失。
Figure 13.8 Comparison of four granular streams on a time line. In the left-hand column, a circle represents a grain. Density is the same for all four streams. (a) Synchronous. (b) Asynchronous. (c) Synchronous with intermittency. (d) Asynchronous with intermittency. The right-hand column indicates the number of grains. Intermittency results in a loss of grains.
图 13.9 STFT 生成的时频网格细节。小军鼓敲击声,0-200 毫秒,0-200 赫兹。放大的声像图显示了每个时间帧和频率点的能量矩形单元。
Figure 13.9 Detail of a time-frequency grid produced by the STFT. Snare drum hit, 0–200 ms, 0–200 Hz. The zoomed-in sonogram displays the rectangular cells of energy at each time frame and frequency bin.
图 13.10 使用异步算法稀疏填充的声音云,该算法将粒子随时间随机分散。横轴表示时间,纵轴表示频率。(a)具有固定上下频率带宽的基本云。(b)具有线性变化的上下带宽(斜率均为正)的云。(c)具有线性变化的上下带宽(斜率均为正)的云。(d)具有连续变化的上下带宽的云。还要注意,粒子密度随时间而增加。
Figure 13.10 Sound clouds filled sparsely using an asynchronous algorithm that scatters the grains randomly in time. The horizontal axis is time, and the vertical axis is frequency. (a) Basic cloud with fixed upper and lower frequency bandwidth. (b) Cloud with linearly varying upper and lower bandwidth, both with positive slope. (c) Cloud with linearly varying upper (positive slope) and lower bandwidth (negative slope). (d) Cloud with continuously varying upper and lower bandwidth. Notice also that the grain density is increasing over time.
图 13.11 云生成器应用程序运行中。主面板设置云的参数。选择“合成波形”选项,弹出波形选项菜单。选择“波形编辑器”选项。在这些窗口下方是另一个窗口,显示根据此规范生成的云。每个颗粒都清晰可见。
Figure 13.11 Cloud Generator app in action. The main panel sets the parameters of a cloud. The Synthetic waveform option is selected, which brings up a menu of options for the waveform. The Waveform Editor option is selected. Below these windows is another window showing the cloud generated from this specification. Individual grains are visible.
图 13.12 EmissionControl2 应用程序的屏幕。可以同时对多个声音文件进行粒子化处理。粒子化参数推子位于左上方。每个粒子化参数都可以由一个 LFO 源进行调制。右上角的推子控制调制量。六个 LFO 位于中间面板。下方显示当前正在粒子化的声音文件。底部的波形显示显示输出。
Figure 13.12 Screen of the EmissionControl2 app. Multiple sound files can be granulated simultaneously. The granulation parameter faders are on the top left. Each granulation parameter can be modulated by an LFO source. The faders at top right control the amount of modulation. The six LFOs are in the middle panel. Below it is a display of the current sound file being granulated. The bottom waveform display shows the output.
图 13.13 带有控件和指示器图例的 Instruo Arbhar 模块。
Figure 13.13 Instruo Arbhar module with legend of controls and indicators.
图 13.14 一些粒子回放的可能性。解释请见正文。{ a, … f } 是一组有序粒子。
Figure 13.14 Some grain playback possibilities. See the text for an explanation. {a, … f} is a set of ordered grains.
图 13.15 MetaSynth 使用喷罐工具绘制的颗粒云。横轴为时间,纵轴为频率。
Figure 13.15 MetaSynth granular clouds painted with a spray can tool. Time is the horizontal axis and frequency is the vertical axis.
图 13.16 iPad 版 Borderlands 应用。用户沿着波形滑动或拖动圆圈,即可选择需要进行颗粒化的片段。附加的卫星圆圈可用作控制合成参数的推子。
Figure 13.16 Borderlands app for iPad. The user slides or scrubs a circle along the waveform to select segments to granulate. The attached satellite circles function as faders for control of synthesis parameters.
图 14.1 减法合成示意图。将噪声或脉冲等频谱丰富的源输入滤波器,滤波器对输出频谱进行整形。
Figure 14.1 Scheme of subtractive synthesis. A spectrally rich source such as noise or pulses is fed into a filter, which shapes the output spectrum.
图 14.2 声波图像显示了两个滤波器对白噪声信号的巨大影响。声波图像绘制了声音随时间变化的频谱。纵轴表示频率,横轴表示时间。轨迹越暗,能量越多。(a)持续数秒的白噪声。均匀的深灰色纹理表明所有音频频率的能量相等。此图限制为 7 kHz。(b)截止频率为 1 kHz 且斜率较大的低通滤波器可消除 1 kHz 以上的能量。(c)截止频率为 1 kHz 且斜率较大的高通滤波器可消除 1 kHz 以下的所有能量。
Figure 14.2 Sonogram images show the dramatic effect of two filters on a white noise signal. A sonogram plots the spectrum of a sound over time. The vertical axis is frequency, and the horizontal axis is time. The darker the trace, the more energy there is. (a) White noise lasting several seconds. The uniformly dark grey texture shows that the energy is equal in all audio frequencies. This plot is limited to 7 kHz. (b) A lowpass filter with a cutoff frequency at 1 kHz and a sharp slope eliminates the energy above 1 kHz. (c) A highpass filter with a cutoff frequency of 1 kHz and a sharp slope eliminates all the energy below 1 kHz.
图 14.3 Albis Tonfrequenz 滤波器,一种早期电子音乐作品中使用的图形均衡器。
Figure 14.3 The Albis Tonfrequenz filter, a graphic equalizer used in early electronic music compositions.
图 14.4 两个基本数字滤波器。(a)延迟输入并将其添加(前馈)。(b)延迟输出并将其添加(反馈)。
Figure 14.4 Two basic digital filters. (a) Delay the input and add it (feed-forward). (b) Delay the output and add it (feedback).
图 14.5 相位效应。(a)添加两个同相信号会导致该频率的增强。(b)添加两个异相信号会导致相位抵消。
Figure 14.5 Phase effects. (a) Adding two in-phase signals results in a boost at that frequency. (b) Adding two out-of-phase signals results in phase cancelation.
图 14.6 幅度与频率响应,俗称频率响应。纵轴表示以分贝为单位的幅度,横轴表示频率。(a)近乎平坦的响应。(b)非线性响应。
Figure 14.6 Amplitude-versus-frequency response, colloquially called frequency response. The vertical axis is amplitude in decibels, and the horizontal axis is frequency. (a) Nearly flat response. (b) Nonlinear response.
图 14.7 四种常见滤波器类型。截止频率是响应减小的点。对于带通滤波器或带阻滤波器,一个常见参数是中心频率,在该频率附近,一系列频带受到影响。
Figure 14.7 Four common types of filters. The cutoff frequency is the point at which the response diminishes. For a bandpass or bandreject filter, a common parameter is a center frequency around which a band of frequencies is affected.
图 14.8 倾斜滤波器。(a)高倾斜滤波器。在倾斜点以上,信号可以被增强或减弱。如果信号被减弱,高倾斜滤波器的效果相当于低通滤波器。(b)低倾斜滤波器。在倾斜点以下,信号可以被增强或减弱。
Figure 14.8 Shelving filters. (a) High shelving filter. Above the shelf point, the signal can be either boosted or cut. If the signal is cut, the effect of a high shelf filter is equivalent to a lowpass filter. (b) Low shelving filter. Below the shelf point, the signal can be either boosted or cut.
图 14.9 FabFilter Pro-Q 2 的屏幕图像,其中叠加了多个过滤器。
Figure 14.9 Screen image of FabFilter Pro-Q 2 with multiple filters superimposed.
图 14.10 理想滤波器与非理想滤波器。(a)在理想滤波器中,受滤波器影响的频率可以整齐地分为通带和阻带,截止频率是矩形的。(b)在非理想(实际)滤波器中,响应曲线显示出波纹,并且在通带和阻带之间存在或多或少陡峭的过渡带。
Figure 14.10 Ideal versus nonideal filters. (a) In an ideal filter, the frequencies affected by the filter can be neatly divided into a passband and a stopband, and the cutoff is rectangular. (b) In a nonideal (actual) filter, the response curve shows ripple, and there is a more or less steep transition band between the passband and the stopband.
图 14.11 滤波器斜率。(a)缓坡。(b)陡坡。
Figure 14.11 Filter slopes. (a) Gentle slope. (b) Steep slope.
图 14.12 FabFilter Pro-Q 2 砖墙式低通(或高切)滤波器,斜率为 96 dB/倍频程,应用于噪声信号。频谱显示处理前(较暗)和处理后(较亮)。
Figure 14.12 FabFilter Pro-Q 2 brickwall lowpass (or high cut) filter with a 96 dB/octave slope applied to a noise signal. The spectrum display shows before (darker) and after (lighter).
图 14.13不同Q 值滤波器。高Q 值对应较窄的响应。增益(峰值高度)恒定。
Figure 14.13 A filter set at various values for Q. A high Q corresponds to a narrow response. The gain (height of the peak) is constant.
图 14.14 不同的增益因子应用于同一滤波器。带宽和Q 值保持不变。
Figure 14.14 Different gain factors applied to the same filter. The bandwidth and Q remain constant.
图 14.15 相同常数Q 值滤波器的线性和对数频率范围曲线。滤波器 1 的中心频率为 30 Hz,带宽范围为 20 至 40 Hz。滤波器 2 的中心频率为 9 KHz,带宽范围为 6 至 12 KHz。(a) 线性。(b) 对数。
Figure 14.15 The same constant Q filters plotted on linear and logarithmic frequency ranges. Filter 1 has a center frequency of 30 Hz and extends from 20 to 40 Hz in bandwidth. Filter 2 has a center frequency of 9 KHz and extends from 6 to 12 KHz. (a) Linear. (b) Logarithmic.
图 14.16 十级滤波器组或频谱整形器,每个频带都有一个控制旋钮(增强或衰减)。
Figure 14.16 A ten-stage filter bank or spectrum shaper with a control knob (boost or attenuate) associated with each frequency band.
图 14.17 图形均衡器。(a)七段图形均衡器,其线性电位器可设置为任意级别。(b)七段图形均衡器的潜在频率响应曲线。
Figure 14.17 Graphic equalizer. (a) A seven-band graphic equalizer with linear potentiometers set to arbitrary levels. (b) The potential frequency response curve of a seven-band graphic equalizer.
图 14.18 梳状滤波器频率响应曲线。纵坐标表示幅度,横坐标表示频率,并以线性方式绘制。该滤波器每 2,000 Hz 出现一个峰值和谷值。(a) 有限脉冲响应 (FIR) 梳状滤波器。(b) 无限脉冲响应 (IIR) 梳状滤波器。
Figure 14.18 Comb filter frequency response curves. The vertical scale is amplitude, and the horizontal scale is frequency plotted linearly. This filter has peaks and troughs every 2,000 Hz. (a) Finite impulse response (FIR) comb. (b) Infinite impulse response (IIR) comb.
图 14.19声码器。第一 级为分析部分,第二级为合成部分。F代表滤波器, ED代表包络检波器, A代表压控放大器——其增益由包络检波器输入的控制电压决定的放大器。同样的结构也可以以数字形式实现。
Figure 14.19 Vocoder. Stage 1 is the analysis part, and stage 2 is the synthesis. F stands for filter, ED stands for envelope detector, and A stands for voltage-controlled amplifier—an amplifier whose gain is determined by a control voltage fed into it from the envelope detector. The same structure can also be realized in digital form.
图 14.20 共振峰滤波器对激励函数的影响。(a)激励函数的简化视图,类似于开放声带产生的频谱;具有多个等强度谐波的嗡嗡声。(b)元音频谱的简化视图,显示四个共振峰,分别标记为 1、2、3 和 4。
Figure 14.20 The effect of formant filters on an excitation function. (a) Simplified view of an excitation function like the spectrum produced by the open vocal cords; a buzz sound with a number of equal-strength harmonics. (b) Simplified view of the spectrum of a vowel showing four formant peaks labeled 1, 2, 3, and 4.
图 14.21 用于 LPC 分析/重合成的 BitSpeek 插件。用户界面类似于德州仪器 (TI) 生产的 Speak & Spell 玩具。
Figure 14.21 BitSpeek plug-in for LPC analysis/resynthesis. The user interface resembles the Speak & Spell toy made by Texas Instruments.
图 14.22 LPC 分析的四个阶段。频谱(共振峰)分析、音高检测、幅度检测和浊音/清音分析。
Figure 14.22 Four stages of LPC analysis. Spectrum (formant) analysis, pitch detection, amplitude detection, and voiced/unvoiced analysis.
图 14.23 理想情况下共振峰和逆共振峰滤波器的关系。(a)共振峰滤波器的结果。(b)逆共振峰滤波器的结果。
Figure 14.23 Relation of formant and inverse formant filters in the ideal case. (a) Result of formant filter. (b) Result of inverse formant filter.
图 14.24 为方便编辑而显示的 LPC 帧序列(根据 Dodge [1985])。为了清晰起见,添加了“音素”列。RMS2 列表示残差幅度,RMS1 表示原始信号幅度。ERR 是两者比率的近似值,如果比率较高,则表示为清音信号。PITCH 是估计的音调(以赫兹为单位),DUR 是帧持续时间(以秒为单位)。
Figure 14.24 A sequence of LPC frames as they might be displayed for editing purposes (after Dodge [1985]). The Phoneme column is added for clarity. The RMS2 column indicates the residual amplitude, RMS1 is the original signal amplitude. ERR is an approximation to the ratio between the two and indicates an unvoiced signal if the ratio is high. PITCH is the estimated pitch in Hz, and DUR is the frame duration in seconds.
图 14.25 LPC 合成概览。
Figure 14.25 Overview of LPC synthesis.
图 14.26 LPC 交叉合成从一种声音中获取频谱包络并将其映射到另一种声音上。
Figure 14.26 LPC cross-synthesis takes the spectral envelope from one sound and maps it onto another sound.
图 15.1 将低频振荡器 (LFO) 的输出连接到振荡器的调频 (FM) 输入。白色圆圈表示可插入跳线的插孔。黑色 FM 旋钮控制调制量。
Figure 15.1 Patching the output of a low-frequency oscillator (LFO) to the frequency modulation (FM) input of an oscillator. The white circles are jacks where patch cords can be inserted. The black FM knob determines the amount of modulation.
图 15.2 调制矩阵。
Figure 15.2 Modulation matrix.
图 15.3 调制频谱比较。纵轴为幅度,横轴为频率。音频速率的调制会使输入信号(称为载波 ( c ))发生偏移。图中所示的是一个正弦波。当c进行幅度调制 (AM) 时,其周围环绕着载波与调制频率的和与差。在环形调制 (RM) 中,载波受到抑制。单边带调制 (SSM) 可产生和频或差频,具体取决于用户的选择。图中我们只看到和频。最后,频率调制或 FM 会产生一系列分布在频谱上的和频与差频。
Figure 15.3 Comparison of modulation spectra. Amplitude is the vertical axis and frequency is the horizontal axis. Modulation at audio rates shifts an input signal called the carrier (c). Here it is a single sine wave. When c is amplitude modulated (AM) this results in c being surrounded by the sum and difference of the carrier and the modulating frequencies. In ring modulation (RM) the carrier is suppressed. Single sideband modulation (SSM) results in either sum or difference frequencies, depending on the user’s choice. Here we see the sum only. Finally, frequency modulation or FM produces a series of sum and difference frequencies spread out over the spectrum.
图 15.4 双极正弦波与单极正弦波。(a)双极正弦波在 -1 和 1 之间变化。( b)单极正弦波在 0 和 1 之间变化。
Figure 15.4 Bipolar versus unipolar sine waves. (a) Bipolar sine varies between −1 and 1. (b) Unipolar sine varies between 0 and 1.
图 15.5 环形调制或双极信号乘法的两种等效实现。每个振荡器侧面的方框表示其波形。每个振荡器左上角的输入是幅度,右上角的输入是频率。(a) 通过在载波振荡器内隐式乘法实现 RM。(b) 通过载波信号和调制器信号的显式乘法实现 RM。
Figure 15.5 Two equivalent implementations of ring modulation or bipolar signal multiplication. The box to the side of each oscillator is its waveform. The top left input of each oscillator is the amplitude, and the top right input is the frequency. (a) RM by implicit multiplication within the carrier oscillator. (b) RM by explicit multiplication of the carrier and the modulator signals.
图 15.6 环形调制频谱。(a)对于 1,000 Hz 的载波和 400 Hz 的调制器,和频和差频分别为 1,400 Hz 和 600 Hz。(b)对于 100 Hz 的载波和 400 Hz 的调制器,和频和差频分别为 500 Hz 和-300 Hz。
Figure 15.6 Ring modulation spectra. (a) For a carrier of 1,000 Hz and a modulator of 400 Hz, the sum and difference frequencies are 1,400 and 600 Hz, respectively. (b) For a carrier of 100 Hz and a modulator of 400 Hz, the sum and difference frequencies are 500 and −300 Hz, respectively.
图 15.7 1 秒时间范围内的环形调制波形。(a)1 Hz。(b)4 Hz。(c)(a) 和 (b) 的环形调制。(c) 中的组成频率包括差值− 3 Hz 和和值+ 5 Hz,如下图所示,波形被分成三部分和五部分(分别为实线和虚线)。
Figure 15.7 Ring modulation waveforms over a 1 s time frame. (a) 1 Hz. (b) 4 Hz. (c) Ring modulation of (a) and (b). The component frequencies in (c) include the difference −3 Hz and the sum +5 Hz, indicated by the division of the waveform into three and five parts (solid line and dashed line, respectively) below.
图 15.8 两种环形调制形式。(a)乘法 RM。(b)二极管削波(“斩波”)RM。
Figure 15.8 Two forms of ring modulation. (a) Multiplication RM. (b) Diode-clipping (“chopper”) RM.
图 15.9 音高变换、环形调制和频率变换 (SSM) 的频谱比较。(纵坐标表示振幅,横坐标表示频率。)音高变换会缩放所有频率,从而保留谐波关系。频率变换只是在所有频率上添加一个常数值,从而产生不谐波的声音。
Figure 15.9 Spectrum comparison between the spectra of pitch shifting, ring modulating, and frequency shifting (SSM). (The vertical scale is amplitude and the horizontal scale is frequency.) Pitch shifting scales all the frequencies so that harmonic relations are preserved. Frequency shifting simply adds a constant value to all frequencies, resulting in inharmonic sounds.
图 15.10 1958 年,作曲家 Vladimir Ussachevsky 与巴登巴登西南德国广播电台的 Heck 和 Burck 制造的 Klangumwandler。
Figure 15.10 Composer Vladimir Ussachevsky in 1958 with a Klangumwandler built by Heck and Burck of the Southwest German Radio in Baden-Baden.
图 15.11 使用 Hartley 方法的 SSM 方案。
Figure 15.11 SSM scheme using the Hartley method.
图 15.12 对信号应用包络是次音频调幅的一个简单示例。(a) 中的正弦波信号与 (b) 中的包络信号相乘,得到 (c) 中的包络信号。
Figure 15.12 Applying an envelope to a signal is a simple case of infra-audio AM. The sine wave signal in (a) is multiplied by the envelope signal in (b) to produce the enveloped signal in (c).
图 15.13 1 KHz 正弦波经 400 Hz 正弦波调幅后产生的频谱。两个边带分别位于载波频率附近的和频和差频处。每个边带的幅度均为指数/ 2。
Figure 15.13 Spectrum produced by AM of a 1 KHz sine wave by a 400 Hz sine wave. The two sidebands are at sum and difference frequencies around the carrier frequency. The amplitude of the each of the sidebands is index / 2.
图 15.14 音频 AM 的时域视图。(a) 中的 1 KHz 正弦波信号被 (b) 中的 40 Hz 正弦波信号调制,产生 (c) 中的调幅信号。
Figure 15.14 Time-domain view of audio frequency AM. The 1 KHz sine wave signal in (a) is modulated by the 40 Hz sine wave signal in (b) to produce the amplitude modulated signal in (c).
图 15.15 两种 AM 实现方式。(a) 一种简单的 AM 仪器,其中调制信号假定为单极。(b) 一种更复杂的 AM 仪器,可以控制调制量和音符事件持续时间内的总振幅。每个振荡器侧面的方框表示其波形。对于包络振荡器(表示为 ENV OSC),频率周期为 1 / note_duration。这意味着它们在音符事件持续时间内读取一次波表。正定标器模块确保加法器的调制输入在 0 到 0.5 之间变化。
Figure 15.15 Two implementations of AM. (a) A simple instrument for AM where the modulating signal is assumed to be unipolar. (b) A more complicated instrument for AM with controls for the amount of modulation and the overall amplitude over the duration of the note event. The box to the side of each oscillator is its waveform. In the case of the envelope oscillators (denoted ENV OSC), the frequency period is 1 / note_duration. This means that they read through their wavetable once over the duration of a note event. The Positive scaler module ensures that the modulation input to the adder varies between 0 and 0.5.
图 15.16反馈 AM 补丁,源自 Kleimola 等人 (2011)。余弦振荡器的输出由延迟为Δ、反馈增益为β 的 反馈回路调制。
Figure 15.16 Feedback AM patch, after Kleimola et al. (2011). The output of a cosine oscillator is modulated by a feedback loop with a delay Δ and a feedback gain β.
图 15.17 简单反馈调幅。(a)波形。(b)频谱。Kleimola 等人(2011 年)
Figure 15.17 Simple feedback AM. (a) Waveform. (b) Spectrum. After Kleimola et al. (2011).
图 16.1 一个简单的调频仪器。调制振荡器的双极性输出被添加到基频载波上,使其上下变化。调制器的幅度决定了调制量,即相对于基频载波的频率偏差。
Figure 16.1 A simple FM instrument. The bipolar output of the modulating oscillator is added to the fundamental carrier frequency, causing it to vary up and down. The amplitude of the modulator determines the amount of modulation, which is the frequency deviation from the fundamental carrier frequency.
图 16.2 FM 频谱显示在调制器M的倍数处 ,边带均匀分布在载波C周围。
Figure 16.2 FM spectrum showing sidebands equally spaced around the carrier C at multiples of the modulator M.
图 16.3 调制指数增加时的调频频谱。(a) 载波。(b)–(e) 载波加边带, I = 0(见 a)至 4(见 e)。边带间隔为调制频率M,并关于载波C对称。 (Chowning 1973 年版)
Figure 16.3 FM spectrum with increasing modulation index. (a) Carrier. (b)–(e) Carrier plus sidebands for I = 0 (see a) to 4 (see e). The sidebands are spaced at intervals of the modulating frequency M and are symmetrical about the carrier C. (After Chowning 1973.)
图 16.4 频谱图显示了反射低频边带的影响。C :M比为 1 √ 2,调制指数为 5。向下的线表示相位反转的反射分量。(Chowning,1973 年)
Figure 16.4 Spectral plot showing the effects of reflected low-frequency sidebands. The C:M ratio is 1√2, and the modulation index is 5. The downward lines indicate phase-inverted reflected components. (After Chowning 1973).
图 16.5 贝塞尔函数 1 至 15(从后向前绘制)的三维图像,该图像作为调制指数I(从左向右绘制)的函数,显示了产生的边带数量(根据 Chowning 1973)。线 A、B 和 C 分别表示幅度下降 -40、-60 和 -80 dB 的点。线D表示感知上显著边带的截止点。线 E 表示每阶的最大幅度。线 F 至 K 表示函数的零交叉点,因此也表示在各个边频处产生零幅度或零幅度的指数值。
Figure 16.5 Three-dimensional graph of the Bessel functions 1 to 15 (plotted back to front) as a function of modulation index I (plotted left to right) showing the number of sidebands generated (after Chowning 1973). Lines A, B, and C show the points at which the amplitude falls off by −40, −60, and −80 dB respectively. Line D indicates the cutoff point for perceptually significant sidebands. Line E is the maximum amplitude for each order. Lines F through K show the zero crossings of the functions and therefore values of the index that produce a null or zero amplitude for various side frequencies.
图 16.6 带有幅度和频率包络的简单 FM 仪器。该仪器还能将用户指定的调制指数包络转换为频率偏差参数。
Figure 16.6 Simple FM instrument with envelopes for amplitude and frequency. This instrument also translates a user-specified modulation index envelope into a frequency deviation parameter.
图 16.7 使用三载波 FM 仪器创建的具有三个共振峰区域的频谱。
Figure 16.7 A spectrum with three formant regions created with a three-carrier FM instrument.
图 16.8 由单个调制振荡器(OSC MOD)驱动的三载波 FM 仪器。
Figure 16.8 Triple-carrier FM instrument driven by a single modulating oscillator (OSC MOD).
图 16.9 多模调频仪器。(a)并联多模调频。(b)串联多模调频。
Figure 16.9 MM FM instruments. (a) Parallel MM FM. (b) Series MM FM.
图 16.10 显示了并行 MM FM 产生的泛音数量激增。调制器 1对载波进行调制后发出的每个分量,随后又由调制器 2进行调制从而产生底部所示的频谱分量列表。
Figure 16.10 Diagram showing the explosion in the number of partials produced by parallel MM FM. Each component emitted by the modulation of the Carrier by Modulator 1 is then modulated by Modulator 2, producing the list of spectral components shown at the bottom.
图 16.11当C的频率等于 M 的频率 时,FM 的谐波频谱图,I值范围为 I = 0 至 I = 18(Mitsuhashi 1982c 版)。从左上角的 (a) 开始,从左到右、从上到下阅读图表。注意频谱的不均匀性,随着调制指数的变化,分音先上升后下降。
Figure 16.11 A plot of the harmonic spectrum of FM when the frequency of C is equal to that of M, for values of I ranging from I = 0 to I = 18 (after Mitsuhashi 1982c). Starting with (a) at the top left, read the graphs from left to right, top to bottom. Note how uneven the spectrum is, with partials going up and then down as the modulation index changes.
图 16.12 反馈 FM 仪器。x是正弦波查找表的相位增量。x与从输出反馈的信号相加,再乘以反馈因子β。
Figure 16.12 Feedback FM instrument. x is a phase increment to a sine wave lookup table. x is added with a signal fed back from the output, multiplied by a feedback factor β.
图 16.13 单振荡器反馈调频仪器的频谱,反馈因子β增加,相位增量x设置为 200 Hz。横轴表示频率,范围从 0 到 10 KHz。纵轴表示幅度,范围从 0 到 60 dB。
Figure 16.13 Spectrum of a one-oscillator feedback FM instrument as the feedback factor β increases, with the phase increment x set at 200 Hz. The horizontal axis shows frequency plotted from 0 to 10 KHz. The vertical axis shows amplitude on a scale from 0 to 60 dB.
图 16.14 双振荡器反馈调频仪器。反馈调频振荡器的输出调制第二个非反馈振荡器。
Figure 16.14 Two-oscillator feedback FM instrument. The output of a feedback FM oscillator modulates a second, nonfeedback oscillator.
图 16.15 双振荡器反馈式调频仪器产生的频谱,反馈因子β从0.0982增加到 1.571。x1和x2的频率值均设为 200 Hz,调制指数M设为常数 2。横轴表示频率,范围从 0 到 10 KHz。纵轴表示幅度,范围从 0 到 60 dB。
Figure 16.15 Spectrum generated by a two-oscillator feedback FM instrument as the feedback factor β increases from 0.0982 to 1.571. The frequency values for x1 and x2 are both set at 200 Hz, and the modulation index M is set to the constant value 2. The horizontal axis shows frequency plotted from 0 to 10 KHz. The vertical axis shows amplitude on a scale from 0 to 60 dB.
图 16.16 三振荡器间接反馈调频仪器。一系列三个振荡器相互调制。三个调制指数因子β 1 、β 2和β 3决定调制量。全局输出反馈到第一个调制振荡器。
Figure 16.16 Three-oscillator indirect feedback FM instrument. A series of three oscillators modulate each other. Three modulation index factors β1, β2, and β3 determine the amount of modulation. The global output is fed back into the first modulating oscillator.
图 16.17 Jellinghaus DX 编程器。
Figure 16.17 Jellinghaus DX-Programmer.
图 16.18 使用 Csound 操作码实现的简化 PM 合成仪器。调制振荡器显示在顶部。调制索引由 LINSEG(线段)包络函数控制。调制信号被添加到 PHASOR 信号的输出。PHASOR 是一个周期性锯齿波发生器,其值用于索引载波频率下的正弦波 TABLE。LINSEG 包络发生器控制幅度电平。
Figure 16.18 A simplified PM synthesis instrument realized with Csound opcodes. The modulating oscillator is shown at the top. The index of modulation is controlled by the LINSEG (line-segment) envelope function. The modulating signal is added to the output of a PHASOR signal. A PHASOR is a periodic sawtooth generator whose value is used to index the sine wave TABLE at the carrier frequency. A LINSEG envelope generator controls the amplitude level.
图 16.19 相位失真对波形形状的影响。(a)上图显示了下图中正弦波的正常线性相量扫描函数。(b)略微失真的扫描函数。(c)严重失真的扫描函数会产生频谱丰富的波形。
Figure 16.19 Effect of phase distortion on waveform shape. (a) The top image shows a normal linear phasor scanning function for a sine wave shown at the bottom. (b) Slightly distorted scanning function. (c) A greatly distorted scanning function produces a waveform that is spectrally rich.
图 17.1简单的波形整形乐器。一个正弦振荡器,其振幅由振幅包络信号α 控制,并在整形函数表w 中索引一个值。与其他示例乐器一样,输入到包络振荡器频率输入端的1 / 时长表示它在音符的持续时间内经历了一个周期。
Figure 17.1 Simple waveshaping instrument. A sinusoidal oscillator, whose amplitude is controlled by the amplitude envelope signal α, indexes a value in the shaping function table w. As in other example instruments, the input 1 / duration that is fed into the frequency input of the envelope oscillator indicates that it goes through one cycle over the duration of the note.
图 17.2 具有线性响应的整形函数。该函数将底部所示范围内缩放的输入信号映射到右侧所示比例的输出函数。要了解该函数如何将输入映射到输出值,请从底部垂直读取,然后向右查看相应的输出值。因此,底部的输入值为 -0.4,映射到右侧的输出值为-0.4 。输入和输出之间的这种等价性仅对线性整形函数成立。
Figure 17.2 Shaping function shown with a linear response. The function maps an input signal scaled over the range shown at the bottom to an output function whose scale is shown at the right. To see how the function maps an input to an output value, read vertically from the bottom and then look to the right to see the corresponding output value. Thus an input value of −0.4 on the bottom maps to an output value of −0.4 on the right. This equivalence between the input and the output is true only for a linear shaping function.
图 17.3 四种整形函数。(a)输入信号的反转。(b)衰减。(c)低电平信号的放大(扩展)和高电平信号的削波。(d)复杂的幅度敏感失真。
Figure 17.3 Four shaping functions. (a) Inversion of the input signal. (b) Attenuation. (c) Amplification of low-level signals (expansion) and clipping of high-level signals. (d) Complicated amplitude-sensitive distortion.
图 17.4 带有归一化部分的波形整形仪。α 的值在归一化表中索引一个值,该值用于缩放波形整形器的输出。
Figure 17.4 Waveshaping instrument with a normalization section. The value of α indexes a value in the normalization table that scales the output of the waveshaper.
图 17.5 Buchla 259 复合波形发生器。
Figure 17.5 Buchla 259 Complex Waveform Generator.
图 17.6 正弦波的振幅从 1 级增加到 8 级时折叠了多次。波形折叠从 3 级开始。图片由 Befaco 提供。
Figure 17.6 A sine wave is folded multiple times as its amplitude is increased from level 1 to level 8. Wavefolding begins at level 3. Courtesy Befaco.
图 18.1 棒状或杆状打击乐器的示意图和机械网络(Olson 1967)。
Figure 18.1 Schematic view and mechanical network of a percussion instrument of a bar or rod type (Olson 1967).
图 18.2 在带品吉他琴颈上模拟封闭和弦(Bilbao 和 Torin 2015)。
Figure 18.2 Simulation of a barred chord on a fretted guitar neck (Bilbao and Torin 2015).
图 18.3 振动弦的质量弹簧模型。(a)弹簧模拟弦的弹性。(b)在纵波中,扰动与波的传播方向相同。初始位移(弹簧的压缩)用星号标记。(c)跟随状态。(d)在横波中,初始扰动垂直于波的传播方向。(e)跟随状态。
Figure 18.3 Mass-spring model of vibrating strings. (a) The springs model the elasticity of the string. (b) In a longitudinal wave, the disturbance is in the same direction as the wave propagation. The initial displacement (compression of the spring) is marked by an asterisk. (c) Following state. (d) In a transverse wave, the initial disturbance is perpendicular to the direction of wave propagation. (e) Following state.
图 18.4 振动表面和体积的模型,其质量由弹簧连接。黑点代表质量,线代表弹簧。(a)振动表面模型。(b)鼓面模型,其质量呈圆形排列,由弹簧和质量组成。(c)振动体积可以建模为由六个侧面的弹簧连接的质量网格。
Figure 18.4 Models of vibrating surfaces and volumes as masses connected by springs. The black dots are the masses, and the lines represent springs. (a) Model of a vibrating surface. (b) Model of a drum head as a circular arrangement of springs and masses. (c) A vibrating volume can be modeled as a lattice of masses connected by springs on six sides.
图 18.5 GENESIS 显示波在板中的传播。
Figure 18.5 GENESIS display shows wave propagation in a plate.
图 18.6 MODALYS 程序模拟拨弦。(a)图形表示。(b)对应于 (a) 的 LISP 格式 MODALYS 代码。以分号开头的行是注释。
Figure 18.6 Plucked string simulated by the MODALYS program. (a) Graphical representation. (b) MODALYS code in LISP corresponding to (a). Lines beginning with a semicolon are comments.
图 18.6 (续)
Figure 18.6 (continued)
图 18.7 REAKTOR PRISM 前面板细节,显示激励器和模式库。
Figure 18.7 Detail of REAKTOR PRISM front panel showing Exciter and Modal Bank.
图 18.8 Substantia 模态合成器截图。
Figure 18.8 Substantia modal synthesizer screenshot.
图 18.9 琴槌敲击琴弦中心,会产生两个方向相反的波。这种现象是弦振动延迟线范式的基础。
Figure 18.9 A string struck by a hammer at the center generates two waves moving in opposite directions. This behavior is the basis of the delay line paradigm of string vibration.
图 18.10 通用波导乐器模型,可模拟弦乐器或管乐器(源自 Cook [1992])。非线性激励信号注入上层延迟线,传播至散射结点,该结点模拟声学系统中结点处的能量损耗和散射。部分能量返回到振荡器结点,部分能量传递至输出结点,该结点由滤波器模拟。
Figure 18.10 Generic waveguide instrument model capable of simulating stringed or wind instruments (after Cook [1992]). A nonlinear excitation injected into the upper delay line travels until it hits the scattering junction, which models the losses and dispersion of energy that occur at junctions in acoustical systems. Some energy returns to the oscillator junction, and some passes on to the output junction, modeled by a filter.
图 18.11 非圆柱管的波导近似。(a)光滑声管,例如异国喇叭或声道的一部分。(b)通过将管道划分成多个部分进行的近似——实际上是在空间中采样。
Figure 18.11 Waveguide approximation of noncylindrical tubes. (a) Smooth acoustic tube, such as an exotic horn or a portion of the vocal tract. (b) Approximation by partitioning the tube into sections—in effect, sampling in space.
图 18.12 使用波导技术建模的五部分单簧管结构。由于上下管孔的尺寸会根据演奏的音高而变化,因此只需要一个孔。
Figure 18.12 Clarinet modeled as a five-part structure using waveguide techniques. Only a single hole is needed because the size of the upper and lower bores changes according to the pitch being played.
图 18.13 TBone 铜管乐器工作台。
Figure 18.13 TBone brass instrument workbench.
图 18.14 用 C 语言编写的 Maracas 模型(Cook 1996、1997、2007)。
Figure 18.14 Maracas model coded in C (Cook 1996, 1997, 2007).
图 18.15 Viscount Physis G1000 钢琴。
Figure 18.15 Viscount Physis G1000 piano.
图 18.16 Karplus-Strong 循环波表的核心。循环波表的输入在每个事件开始时切换到噪声源,然后在事件的剩余部分切换回修改器循环。修改器对连续样本进行平均,模拟阻尼效应。
Figure 18.16 Core of the Karplus-Strong recirculating wavetable. The input to the recirculating wavetable switches to the noise source at the beginning of each event and then switches back to the modifier loop for the rest of the event. The modifier averages successive samples, simulating a damping effect.
图 18.17 Karplus-Strong 鼓合成算法。其中, b为混合因子。
Figure 18.17 The Karplus-Strong drum synthesis algorithm. The quantity b is the blend factor.
图 18.18 Rootnot 在 Wablet 应用程序中用于扫描合成的图形界面。本例中选择了一个二维圆形网格。(a) 静止的有限元模型,以低频安静振动。(b) 受触摸屏手势激励后的有限元模型。扫描此形状时,它会产生特定的音色,可以通过滑块或键盘映射到任意音高。
Figure 18.18 Graphical interface for scanned synthesis in the Wablet app by Rootnot. In this case, a two-dimensional circular mesh has been selected. (a) The finite-element model at rest, quietly vibrating at a low frequency. (b) The finite-element model after being excited by a touchscreen gesture. When this shape is scanned it produces a specific timbre that can be mapped to an arbitrary pitch with a slider or keyboard.
图 18.19 Wold (1987) 实现的参数估计声音分析仪。其目标是估计基于物理模型的合成器的参数,以分离两个混合信号。如果给定的估计值与近似状态方程模型相差过大,系统将尝试进行另一次迭代估计。
Figure 18.19 Parameter-estimation sound analyzer implemented by Wold (1987). The goal was to estimate parameters for a physical model–based synthesizer, with a view toward separation of two mixed signals. If a given estimate was too far from the approximate state-equation model, the system tried another iteration of estimation.
图 18.20 Singer(一种用于人声的物理模型合成器)的框图。图的左侧部分描绘了激励源。中间部分描绘了波导谐振器。右侧部分描绘了输出级。两个声门波表振荡器(Glot1 和 Glot2)允许激励信号产生缓慢的颤音变化。声门噪声源由滤波后的白噪声乘以与声门振荡器同步的波形组成。这允许将脉冲噪声混合到周期性源中。正弦振荡器模拟颤音,其频率由噪声随机化。滤波后的白噪声被注入到前向移动的声门波中。噪声可以插入到任意数量的波导部分中。混合的声门源馈入声道滤波器。声门反射由简单的反射系数建模,低通滤波器模拟唇音和鼻孔效应。低通滤波器和延迟线模拟喉咙输出路径中的皮肤辐射。
Figure 18.20 Block diagram of Singer, a physical model synthesizer for vocal sounds. The left section of the figure depicts the excitation sources. The center section depicts the waveguide resonators. The right section depicts the output stage. Two glottal wavetable oscillators (Glot1 and Glot2) allow slow, vibrato variations in the excitation signal. The glottal noise source consists of filtered white noise multiplied by a waveshape synchronized to the glottal oscillators. This permits pulsed noise to be mixed into the periodic source. A sine oscillator simulates vibrato, the frequency of which is randomized by noise. Filtered white noise is injected into the forward-moving glottal wave. Noise can be inserted into any number of waveguide sections. The mixed glottal source feeds into the vocal tract filter. Glottal reflections are modeled by a simple reflection coefficient, and a lowpass filter simulates lip and nostril effects. A lowpass filter and delay line model the radiation from the skin in the Throat output path.
图 18.21 Ableton Collision 乐器,用于演奏木琴、马林巴琴、钟琴等槌音以及其他打击乐器。
Figure 18.21 Ableton Collision instruments for mallet sounds such as xylophones, marimbas, and glockenspiel and for other percussion.
图 19.1 乔治·珍妮 (George Jenny) 1968 年关于 Ondioline 的文章的再版(《被遗忘的未来》2017 年)。
Figure 19.1 Reissue of George Jenny’s 1968 article on the Ondioline (Forgotten Futures 2017).
图 19.2 模拟音频电路内部。深色圆柱体是电容器。左侧深色方形物体是变压器。还可以看到电阻器和运算放大器。
Figure 19.2 Inside an analog audio circuit. The dark cylinders are capacitors. The dark squarish object at left is a transformer. One can also see resistors and op-amps.
图 19.3 Arturia Moog V 的屏幕图像,这是 Moog 合成器的虚拟模拟仿真。屏幕上的控制面板模仿了原始硬件。
Figure 19.3 Screen image of the Arturia Moog V, a virtual analog emulation of a Moog Synthesizer. The onscreen control panel mimics the original hardware.
图 19.4 Softube 模块化虚拟合成器。
Figure 19.4 The Softube Modular virtual synthesizer.
图 19.5 Native Instruments Reaktor Blocks 前面板。
Figure 19.5 Native Instruments Reaktor Blocks front panel.
图 19.6 通过在奇次谐波频率处添加九个正弦分量来近似方波。(a)频谱。(b)波形。通过添加更多奇数正弦分量,波形将变得更像方波,同时仍然受到带宽限制。
Figure 19.6 Approximation of a square wave by adding nine sinusoidal components at odd harmonic frequencies. (a) Spectrum. (b) Waveform. By adding more odd sine components, the waveform will be made more square-like while still being bandlimited.
图 19.7 1969 年美国专利 3,475,623 中的 Moog 四级低通滤波器,是 Moog 904 低通/高通/带通/带阻滤波器模块的组件。
Figure 19.7 The Moog four-stage lowpass filter from the 1969 U.S. Patent 3,475,623, a component of the Moog 904 lowpass/highpass/bandpass/bandreject filter modules.
图 19.8 Clavia Nord Lead 4 虚拟模拟合成器的控制面板,带有 60 多个可直接访问的按钮和旋钮。
Figure 19.8 Control panel of the Clavia Nord Lead 4 virtual analog synthesizer with over sixty directly accessible buttons and knobs.
图 19.9 Peavey ReValver模块调整页面允许用户在组件级别修改电路原理图。
Figure 19.9 Peavey ReValver Module tweak page lets users modify the circuit schematic at the component level.
图 20.1 共振峰区域在频谱中呈现为峰值。此处共振峰中心位于 1 KHz。
Figure 20.1 A formant region appears as a peak in the spectrum. Here a formant centers at 1 KHz.
图 20.2 一组 FOF 发生器,由输入脉冲驱动,在每个音调周期触发一个 FOF粒子。所有 FOF 发生器的输出相加,生成一个复合输出信号。
Figure 20.2 A bank of FOF generators driven by input pulses that trigger an FOF grain at each pitch period. The output of all FOF generators is summed to generate a composite output signal.
图 20.3 FOF 合成和处理配置。输出可以是正弦波、滤波噪声、滤波采样声音或它们的组合。
Figure 20.3 FOF synthesis and processing configuration. The output can be sine waves, filtered noise, filtered sampled sounds, or a combination thereof.
图 20.4 FOF 颗粒和频谱。(a)FOF 发生器发出的单个颗粒或单音脉冲。(b)图 (a) 中颗粒的频谱,以对数幅度刻度绘制。(源自 d'Allessandro 和 Rodet,1989 年。)
Figure 20.4 FOF grain and spectrum. (a) A single grain or toneburst emitted by an FOF generator. (b) Spectrum of the grain in (a), plotted on a logarithmic amplitude scale. (After d’Allessandro and Rodet 1989.)
图 20.5 由多个 FOF 发生器并行产生的声音的共振峰谱。
Figure 20.5 Formant spectrum of a vocal tone produced by several FOF generators in parallel.
图 20.6 FOF 参数。(a)FOF 的时域视图。参数p4表示启动时间(在大多数实现中称为tex ), p2表示衰减时间(称为atten)。(b)四个共振峰参数的频域视图。参数p1是共振峰的中心频率, p2是共振峰带宽。参数p3是共振峰的峰值幅度, p4是共振峰裙边的宽度。
Figure 20.6 FOF parameters. (a) Time-domain view of an FOF. Parameter p4 represents the attack time (called tex in most implementations), and p2 represents the decay (called atten). (b) Frequency-domain view of the four formant parameters. Parameter p1 is the center frequency of the formant, and p2 is the formant bandwidth. Parameter p3 is the peak amplitude of the formant, and p4 is the width of the formant skirt.
图 20.7 改变起始时间对共振峰裙边带宽的影响。细线,宽共振峰: p4 = 0.1 毫秒。中等线,中等共振峰: p4 = 1 毫秒。粗线,窄共振峰: p4 = 10 毫秒。
Figure 20.7 Effect of varying the attack time on the formant skirt bandwidth. Thin line, wide formant: p4 = 0.1 ms. Medium line, medium formant: p4 = 1 ms. Thick line, narrow formant: p4 = 10 ms.
图 20.8 VOSIM 脉冲序列。
Figure 20.8 A VOSIM pulse train.
图 20.9 一个具有 5 个脉冲、衰减常数为 0.8 的 VOSIM 振荡器产生的频谱。(De Poli 1983 年)
Figure 20.9 Spectrum produced by a VOSIM oscillator with five pulses and an attentuation constant of 0.8. (After De Poli 1983.)
图 20.10 窗函数脉冲。(a)时域中的脉冲。(b)频谱的一侧。图的左边缘对应于脉冲的中心频率,瓣代表边带,所有边带都比中心频率峰值低 70 dB 以上。(Nuttall,1981 年)
Figure 20.10 Window function pulse. (a) Pulse shown in time domain. (b) One side of the frequency spectrum. The left edge of the figure corresponds to the center frequency of the pulse, and the lobes represent sidebands, all of which are more than 70 dB down from the center frequency peak. (After Nuttall 1981.)
图 20.11 两个相隔八度的 WF 信号的时域视图。(a)低频信号。(b)高频信号。
Figure 20.11 Time domain view of two WF signals an octave apart. (a) Low-frequency signal. (b) Higher-frequency signal.
图 20.12 将 WF 脉冲流乘以周期性的时隙权重序列,得到一系列加权 WF 脉冲。
Figure 20.12 A stream of WF pulses multiplied by a periodic sequence of slot weights to obtain a series of weighted WF pulses.
图 20.13 中音萨克斯管音调时变频谱的前 20 个泛音的绘图。低泛音位于绘图的后部。(a)用中音萨克斯管演奏的原声。(b)用 WF 合成器产生的合成音。(根据 Goeddel 和 Bass 1984 年的研究。)
Figure 20.13 Plots of the first twenty harmonics of the time-varying spectrum of an alto saxophone tone. Low harmonics are toward the back of the plot. (a) Original played on alto saxophone. (b) Synthetic tone created with WF synthesis. (After Goeddel and Bass 1984.)
图 20.14 带有两个载波和两个波形整形器的 ModFM 合成框图(源自 Lazzarini 和 Timoney [2013])。单个相量同步所有余弦发生器。调制器信号经过两个具有指数函数的波形整形器。频谱混合由[0, 1] 范围内的变量A控制。
Figure 20.14 Block diagram of ModFM synthesis with two carriers and two waveshapers (after Lazzarini and Timoney [2013)]). A single phasor synchronizes all cosine generators. The modulator signal passes through two waveshapers with exponential functions. The spectrum mix is controlled by variable A in the range [0,1].
图 21.1 脉冲星的结构。(a)脉冲星由一个短暂的能量爆发(称为脉冲星w)组成,其持续时间为d,随后是一段静默期s。脉冲星的波形(此处显示为带限脉冲)是任意的。它也可以是正弦波或采样声音的周期。总持续时间为p = d + s,其中p是脉冲星的基本周期。(b)脉冲星串的演化,时域视图。随着时间的推移,脉冲星周期p保持不变,而脉冲星周期d缩短。椭圆表示所示三个脉冲星之间存在一个包含许多脉冲星的渐进过渡期。
Figure 21.1 Anatomy of a pulsar. (a) A pulsar consists of a brief burst of energy called a pulsaret w of a duration d followed by a silent interval s. The waveform of the pulsaret, here shown as a bandlimited pulse, is arbitrary. It could also be a sine wave or a period of a sampled sound. The total duration is p = d + s, where p is the fundamental period of the pulsar. (b) Evolution of a pulsar train, time-domain view. Over time, the pulsar period p remains constant while the pulsaret period d shrinks. The ellipses indicate a gradual transition period containing many pulsars between the three shown.
图 21.2 典型的脉冲星波形。实际上,可以使用任何波形。(a)正弦波。(b)多周期正弦波。(c)带限脉冲。(d)衰减的多周期正弦波。(e)中子星船帆座 X-1 发射的宇宙脉冲星波形。
Figure 21.2 Typical pulsaret waveforms. In practice, any waveform can be used. (a) Sine. (b) Multicycle sine. (c) Bandlimited pulse. (d) Decaying multicycle sinusoid. (e) Cosmic pulsar waveform emitted by the neutron star Vela X-1.
图 21.3 典型的脉冲波形包络v。(a)矩形。(b)高斯。(c)线性衰减。(d)指数衰减。β 项决定了指数曲线的陡度。(e)线性攻击,占空比为d。 (f)指数攻击。ξ项决定了指数曲线的陡度。(g)FOF 包络。(h)双极调制器。
Figure 21.3 Typical pulsaret envelopes v. (a) Rectangular. (b) Gaussian. (c) Linear decay. (d) Exponential decay. The term β determines the steepness of the exponential curve. (e) Linear attack, with duty cycle d. (f) Exponential attack. The term ξ determines the steepness of the exponential curve. (g) FOF envelope. (h) Bipolar modulator.
图 21.4 PWM 和脉冲星。(a) 具有矩形脉冲形状的经典 PWM。椭圆表示脉冲之间的逐渐过渡。(b) 占空比d = 0 时的 PWM 产生零幅度信号。(c) 占空比d = p(基波周期)时的 PWM,产生幅度恒定为 1 的信号。(d) 具有正弦脉冲的脉冲星序列。(e) 与 (d) 周期相同,但占空比增加。(f) 占空比和周期相等,产生正弦波。(g) 占空比大于基波周期,从而切断了正弦波形的最后部分。
Figure 21.4 PWM and PulWM. (a) Classical PWM with a rectangular pulse shape. The ellipses indicate a gradual transition between the pulses. (b) PWM when the duty cycle d = 0 results in a signal of zero amplitude. (c) PWM when the duty cycle d = p (the fundamental period), the result is a signal with a constant amplitude of 1. (d) Pulsar train with a sinusoidal pulsaret. (e) Same period as (d), but the duty cycle is increasing. (f) The duty cycle and the period are equal, resulting in a sinusoid. (g) The duty cycle is greater than the fundamental period, which cuts off the final part of the sine waveform.
图 21.5 脉冲星节律。上图:节律脉冲图,显示脉冲星发射速率(纵坐标)与时间(横坐标)的关系。左侧刻度测量传统音符时值,右侧刻度测量频率。下图:生成的脉冲星序列的时域图像,与上图相对应。
Figure 21.5 Pulsar rhythms. Top: Pulse graph of rhythm showing rate of pulsar emission (vertical scale) plotted against time (horizontal scale). The left-hand scale measures traditional note values, and the right-hand scale measures frequencies. Bottom: Time-domain image of generated pulsar train corresponding to the plot at the top.
图 21.6 脉冲星包络对频谱的影响。上图展示了一颗脉冲星的频率-时间声谱图,该脉冲星具有正弦波脉冲星,基频为 12 Hz,共振峰频率为 500 Hz。这些声谱图基于使用冯·汉窗的 1,024 点快速傅里叶变换图,并以线性频率刻度绘制。从左到右依次为矩形包络、指数包络和高斯包络产生的声谱图。下图以 dB 为单位绘制了这些脉冲星的频谱。
Figure 21.6 Effect of the pulsaret envelope on the spectrum. The top panel presents frequency-versus-time sonograms of an individual pulsar with a sinusoidal pulsaret, a fundamental frequency of 12 Hz, and a formant frequency of 500 Hz. The sonograms are based on 1,024-point fast Fourier transform plots using a Von Hann window and are plotted on a linear frequency scale. Shown left to right are the sonogram produced by a rectangular envelope, an expodec envelope, and a Gaussian envelope. The bottom panel plots the spectra of these pulsars on a dB scale.
图 21.7 脉冲星合成示意图。脉冲星发生器具有独立的包络控制,用于控制基频、共振峰频率、振幅、随机掩蔽和空间位置。在高级脉冲星合成中,可以将多个发生器与独立的共振峰和空间包络连接起来。脉冲星流可以与采样声音进行卷积。
Figure 21.7 Schema of pulsar synthesis. A pulsar generator with separate envelope controls for fundamental frequency, formant frequency, amplitude, stochastic masking, and spatial position. In advanced pulsar synthesis, several generators may be linked with separate formant and spatial envelopes. A pulsar stream may be convolved with a sampled sound.
图 21.8 脉冲星掩蔽将规则序列变成了不规则序列。脉冲星以四分音符表示,被掩蔽的脉冲星以四分休止符表示。(a)突发掩蔽。此处的突发比为 3:3。(b)通道掩蔽。(c)根据概率表进行的随机掩蔽。概率为 1 时,无掩蔽。概率为 0 时,无脉冲星。在中间,脉冲星序列是间歇性的。注意概率曲线在中心下降时纹理变薄的情况。
Figure 21.8 Pulsar masking turns a regular train into an irregular train. Pulsars are illustrated as quarter notes, and masked pulsars are indicated as quarter rests. (a) Burst masking. The burst ratio here is 3:3. (b) Channel masking. (c) Stochastic masking according to a probability table. When the probability is 1, there is no masking. When the probability is 0, there are no pulsars. In the middle, the pulsar train is intermittent. Notice the thinning out of the texture as the probability curve dips in the center.
图 21.9 声谱图展示了音频范围内突发掩蔽的效果。脉冲波是正弦波的一个周期,其包络为矩形。b :r比为 2:1。基频为 100 Hz,共振峰频率为 400 Hz。请注意,由于脉冲掩蔽间隔的周期延长(400 Hz/3),133 Hz 和 266 Hz 处出现了次谐波。
Figure 21.9 Sonogram depicting the effect of burst masking in the audio frequency range. The pulsaret is one cycle of a sinusoid, and the pulsaret envelope is rectangular. The b:r ratio is 2:1. The fundamental frequency is 100 Hz, and the formant frequency is 400 Hz. Notice the subharmonics at 133 Hz and 266 Hz caused by the extended periodicity of the pulse masking interval (400 Hz/3).
图 21.10 脉冲星序列卷积的效果。(a)采样声音,意大利语单词qui(发音为kwee)。(b)具有可变基频和共振峰频率的次声脉冲星序列。(c)(a)和(b)的卷积。
Figure 21.10 Effect of convolution with pulsar train. (a) Sampled sound, the Italian word qui (pronounced kwee). (b) Infrasonic pulsar train with a variable fundamental and formant frequency. (c) Convolution of (a) and (b).
图 21.11 Alberto de Campo 和 Curtis Roads 开发的 PulsarGenerator 应用程序的控制面板。
Figure 21.11 Control panel of the PulsarGenerator application by Alberto de Campo and Curtis Roads.
图 21.12 Marcin Pietruszewski (2019) 编写的新型脉冲星生成器 (nuPG) 程序的全局视图。该程序模拟了 Roads 和 de Campo 编写的经典 PulsarGenerator 的功能,并提供了一系列扩展,例如单脉冲星处理(频率和振幅调制、空间化)、参数调制、参数链接以及基于筛选的脉冲星掩蔽。
Figure 21.12 A global view of the New Pulsar Generator (nuPG) program by Marcin Pietruszewski (2019). The program emulates functionality of the classic PulsarGenerator by Roads and de Campo and provides a set of extensions such as per-pulsar processes (frequency and amplitude modulation, spatialization), parameter modulation, parameter linking, and sieve-based pulsar masking.
图 21.13 Hamburg Audio Nuklear 脉冲星合成器的屏幕图像。
Figure 21.13 Screen image of Hamburg Audio Nuklear pulsar synthesizer.
图 22.1 简单的插值技术。(a)原始断点。(b)常数。(c)线性。
Figure 22.1 Simple interpolation techniques. (a) Original breakpoints. (b) Constant. (c) Linear.
图 22.2半余弦插值。(a)在两点A和B 之间绘制半余弦。注意两个拐点(弯曲点)。(b)在多个点之间进行半余弦插值。(源自 Mitsuhashi [1982a]。)
Figure 22.2 Half-cosine interpolation. (a) Half-cosine drawn between two points A and B. Notice the two points of inflection (bending points). (b) Half-cosine interpolation between several points. (After Mitsuhashi [1982a].)
图 22.3使用某些 Music N软件合成语言 中提供的 ITP 单元生成器进行波形插值的工具。权重包络指定哪个波形将占主导地位。当权重包络为 1 时,可以听到左振荡器的波形。当权重包络为 0 时,播放右振荡器的波形。当权重包络为 0.5 时,波形是两个波形的逐点平均值。
Figure 22.3 Instrument for waveform interpolation using the ITP unit generator found in some Music N software synthesis languages. The weight envelope specifies which waveform will predominate. When the weight envelope is 1, the left oscillator waveform is heard. When it is 0, the right oscillator plays. When it is 0.5, the waveform is the point-by-point average of the two waveforms.
图 22.4 非均匀断点的效果。(a)用均匀断点绘制的曲线。(b)用非均匀断点绘制的曲线,可以更好地拟合曲线。
Figure 22.4 Effect of nonuniform breakpoints. (a) Curve drawn with uniform breakpoints. (b) Curve drawn with nonuniform breakpoints, yielding a better fit to the curve.
图 22.5 分形插值合成。(a)四个点。(b)通过(a)中的点进行线性插值。(c)对每个点施加函数(b)。(d)此过程八次迭代的结果。(e)第八次迭代使用相同的点,但具有负位移(即,上一代波形被反转)。(f)对相同的点进行第八次迭代,但具有正位移和负位移。
Figure 22.5 Fractal interpolation synthesis. (a) Four points. (b) Linear interpolation through the points in (a). (c) Function (b) is imposed on each point. (d) The result of eight iterations of this process. (e) Eighth iteration using the same points but with negative displacements (i.e., the waveform of the previous generation is inverted). (f) Eighth iteration of the same points but with positive and negative displacements.
图 23.1 拼接合成的基本算法。该算法首先分析目标声音,并将其转换为一组以特定方式(例如时间、时长和音高)描述的单元。然后,算法在语料库中比较这些单元,选择最佳单元,最后将它们合成以生成结果。
Figure 23.1 A basic algorithm for concatenative synthesis. It analyzes a target sound and transforms it into a set of units described in specific ways, such as time, duration, and pitch. It then compares these units within the corpus, selects the best ones, and finally synthesizes them to create a result.
图 23.2 将乐句(爵士小号)分割成不同的手势类型(摘自 Lindemann [2001])。S =静默。MA =中等起音。FS =平缓延音。SR =轻柔释放。HA =中等起音。SDS =短促向下连音。
Figure 23.2 Segmenting a musical phrase (jazz trumpet) into gesture types (from Lindemann [2001]). S = silence. MA = medium attack. FS = flat sustain. SR = soft release. HA = medium attack. SDS = small downward slur.
图 23.3 每个语料库单元(点)和目标单元(x)由低级描述符均方根(RMS) 能量和谱质心描述。该算法将最佳语料库单元定义为在以目标单元为中心的Δ SP 和Δ RMS划分的框内最接近目标单元的单元。在本例中,最佳语料库单元标记为 B。
Figure 23.3 Each corpus unit (dots) and the target unit (x) are described by the low-level descriptors root mean square (RMS) energy and spectral centroid. The algorithm defines the best corpus unit as the one closest to the target unit within the box demarcated by ΔSP and ΔRMS centered on the target unit. In this case, the best corpus unit is that labeled B.
图 23.4 Diemo Schwarz 使用 Max 构建的 CataRT 系统截图( catart.lcd5)。左侧是用于更改右侧框中显示的参数。显示中的每个点代表语料库中的一个声音单元。其中, x轴表示单元的频谱质心, y轴表示其周期性,点的颜色表示其响度。用户通过绘制椭圆选择了一组语料库单元。系统随机播放所选单元。鼠标(指针)也可用于探索此空间中的单元。
Figure 23.4 Screenshot of the CataRT system, built by Diemo Schwarz in Max (catart.lcd5). On the left side are parameters for changing the display shown in the box at right. Each dot in the display is a sound unit in the corpus. Here, the x-axis describes the spectral centroid of a unit, the y-axis describes its periodicity, and the color of a dot describes its loudness. Here, the user has selected a set of corpus units by drawing an ellipse. The system plays the selected units randomly. The mouse (pointer) can also be used to explore the units in this space.
图23.5 Vocaloid合成器的屏幕界面。
Figure 23.5 Screen interface of the Vocaloid synthesizer.
图 23.6 RPM 合成器概览。输入的 MIDI 流包含带有音高和力度的音符开启和音符关闭信息,以及决定颤音强度、乐器响度、音色和弯音的连续控制器信息。当演奏者用力吹奏小号时,音色会更响亮、更明亮。RPM 利用这种相关性,根据从 MIDI 流中获得的缓慢变化的音高和响度来预测乐器的音色。这种音色表示为乐器声音各个谐波的缓慢变化幅度。缓慢变化的音高和响度以及基本的底层音色没有乐器的快速小幅波动。这些响度、音高和音色的快速波动赋予了乐器真实感。在音符之间的过渡期间尤其如此,因为每个乐器的波动都很快且独特。为了生成这些快速波动,RPM 依赖于录制的乐句数据库。这些不是孤立的音符,而是代表着发音和乐句的连续乐段:分离音、连奏、滑音、尖音、弱音等等。(Lindemann 2007)
Figure 23.6 Overview of the RPM synthesizer. The input MIDI stream includes note-on and note-off messages with pitch and velocity as well as continuous controller information that determines vibrato intensity, instrument loudness, timbre, and pitch bend. When a performer blows a trumpet harder it is louder and brighter in timbre. RPM uses this correlation to predict a timbre of the instrument based on the slowly varying pitch and loudness derived from the MIDI stream. This timbre is represented as slowly varying amplitudes of the individual harmonics of the instrument sound. The slowly varying pitch and loudness and basic underlying timbre do not have the small rapid fluctuations of instruments. These rapid fluctuations in loudness, pitch, and timbre give an instrument its realism. This is especially true during transitions from note to note when the fluctuations are rapid and unique for each instrument. To generate these rapid fluctuations RPM relies on a database of recorded musical phrases. These are not isolated notes but continuous musical passages that represent articulation and phrasings: detached, slurred, portamento, sharp attacks, soft attacks, and so forth. (Lindemann 2007)
图 24.1 Welte 轻音风琴所用的光盘上蚀刻的波形。
Figure 24.1 Waveforms etched on an optical disc used in the Welte Light-Tone Organ.
图 24.2 光子的工作原理。按下键盘上的某个键,灯就会亮。光调制器会根据按下的键来调整光的调制速率。“电眼”(光电管)会拾取交变光束,并将其转换为交流电压,再经过放大后驱动扬声器。
Figure 24.2 Mechanism of the Photona. Pressing a key on the keyboard causes a lamp to light. The light chopper modulates this light at a rate corresponding to the key depressed. An “electric eye” (photocell) picks up the alternating light beam and converts it to alternating voltage that is amplified to drive a loudspeaker.
图 24.3 诺曼·麦克拉伦 (Norman McLaren) 在 20 世纪 50 年代直接设计了光学电影音轨的波形。
Figure 24.3 Norman McLaren directly designed waveforms for an optical film soundtrack in the 1950s.
图 24.4 Oramics 系统。
Figure 24.4 Oramics system.
图 24.5 莫斯科格林卡音乐文化博物馆的 ANS 合成器。中间的黑色面板是蚀刻表面。
Figure 24.5 ANS synthesizer at the Glinka Museum of Musical Culture, Moscow. The black panel at center is the etching surface.
图 24.6 马克斯·马修斯 (右) 和劳伦斯·罗斯勒 (Lawrence Rosler) 使用 Graphic1 控制台,由 DEC PDP-5 计算机控制,大约 1968 年,贝尔电话实验室。
Figure 24.6 Max Mathews (right) and Lawrence Rosler with the Graphic1 console, controlled by a DEC PDP-5 computer, around 1968, Bell Telephone Laboratories.
图 24.7 图形 1 屏幕。用于控制合成的四个包络,包括振幅包络、频率包络、持续时间包络和滑音包络。
Figure 24.7 Graphic 1 screen. Four envelopes for controlling synthesis, including envelopes for amplitude, frequency, duration, and glissando.
图 24.8 Xenakis 于 1980 年使用第一个 UPIC 系统。
Figure 24.8 Xenakis with the first UPIC system in 1980.
图 24.9 Xenakis 的 UPIC 评分Mycenae-Alpha (1980)。树枝状结构体现了 Xenakis 对树状结构的运用(Harley 2004)。
Figure 24.9 Xenakis’s UPIC score Mycenae-Alpha (1980). The branch-like structures exemplify Xenakis’s use of arborescences (Harley 2004).
图 24.10 杰拉德·帕普 (Gerard Pape) 1992 年创作的乐谱中的一页,由巴黎 Les Ateliers UPIC 的实时 UPIC 系统实现。屏幕下方的图标代表一组波形和包络的工作状态。
Figure 24.10 A page from a 1992 score by Gerard Pape, realized with a real-time UPIC system at Les Ateliers UPIC, Paris. The icons in the lower part of the screen represent a working set of waveforms and envelopes.
图 24.11 iPhone 上的 UPISketch 屏幕图像。左侧是声音调色板。右侧是音高/时间绘图区域。虽然在这张灰度图中难以辨认,但每个手势框都包含两个包络。每个框的顶部是振幅包络。以淡灰色(实际上是亮橙色)显示的是音高包络。请注意,右上角标记为 ee-3 的手势框已被选中进行编辑。屏幕顶部是用于文件输入/输出和设置、选择和编辑、缩放、绘制、删除、撤消、重做和文档的工具。
Figure 24.11 UPISketch screen image from iPhone. On the left is the palette of sounds. On the right is the pitch/time drawing area. Although hard to discern in this grayscale image, each gesture box contains two envelopes. At the top of each box is the amplitude envelope. Shown in faint gray (in reality, bright orange) is the pitch envelope. Notice that the gesture box labeled ee-3, at the upper right, has been selected for editing. At the top of the screen are tools for file input/output and settings, select and edit, zoom, draw, delete, undo, redo, and documentation.
图 24.12 虚拟 ANS 合成器的屏幕图像。
Figure 24.12 Screen image of the Virtual ANS synthesizer.
图 24.13 MetaSynth 的 ImageSynth 屏幕。
Figure 24.13 MetaSynth’s ImageSynth screen.
图 25.1 随机 AM(a)和 FM(b)。
Figure 25.1 Random AM (a) and FM (b).
图 25.2 随机 AM(a)和 FM(b)的仪器定义。
Figure 25.2 Instrument definitions of random AM (a) and FM (b).
图 25.3 随机波形整形函数。
Figure 25.3 Random waveshaping functions.
图 25.4 由图 25.3 所示的函数整形的正弦波。
Figure 25.4 Sine waves shaped by the functions shown in figure 25.3.
图 25.5 逻辑斯蒂映射产生的值的图。(a)当λ接近3.6时,快速变化的混沌波形。(b)当λ在3.8区域时,出现一个稳定岛并保持静止。
Figure 25.5 Plot of the values produced by the logistic map. (a) Rapidly changing chaotic waveform when λ is near 3.6. (b) An island of stability appears and remains stationary when λ is in the region of 3.8.
图 25.6 双曲余弦波形。摘自 Xenakis (1992)。
Figure 25.6 Hyperbolic cosine waveforms. From Xenakis (1992).
图 25.7 GENDYN 多边形波形指示顶点和插值。
Figure 25.7 GENDYN polygon waveform indicating vertices and interpolation.
图 25.8 GENDYN 镜像。定义镜像的振幅和时间屏障 P、N、T(正、负、时间)约束了从标有星号的顶点生成的下一个顶点。如果随机生成的下一个顶点落在盒子的屏障之外(初始投影 I),则屏障 P 将覆盖该顶点,将该顶点反射回盒子内(反射 R)。
Figure 25.8 GENDYN mirror. The amplitude and time barriers P, N, T (positive, negative, time) defining a mirror constrain the next vertex generated from the vertex labeled with an asterisk. If the next vertex generated stochastically falls outside the barriers of the box (the initial projection I) the barrier P overrides this, reflecting the vertex back into the box (reflection R).
图 25.9 GENDYN 程序生成的波形。波形从顶部开始,一直延伸到底部。
Figure 25.9 Waveform generated by the GENDYN program. The progression starts at the top and proceeds to the bottom.
图 25.10 Soniclab Cosmos f FX7 合成器 屏幕细节。原始画面采用相反的配色方案,即黑色背景。
Figure 25.10 Detail of the screen of Soniclab Cosmosf FX7 synthesizer. The original has an inverse color scheme, that is, a black background.
图 26.1 样本时间尺度上的混合。信号 (a) 和 (b) 由时间点t 1 和t 2 的两个样本混合而成,结果显示为信号 (c)。
Figure 26.1 Mixing on a sample time scale. Signals (a) and (b) consisting of two samples at time points t 1 and t 2 mixed, showing the result as signal (c).
图 26.2 微时间尺度上的波形混合。(a)50 Hz 的正弦音。(b)500 Hz 的正弦音。(c)(a) +(b)的混合。
Figure 26.2 Waveform mixing on a micro time scale. (a) Sine tone at 50 Hz. (b) Sine tone at 500 Hz. (c) Mix of (a) + (b).
图 26.3声音对象混合。(a)中音萨克斯音色。(b)颗粒合成纹理。(c)(a) +(b) 的混合
Figure 26.3 Sound object mixing. (a) Alto saxophone tone. (b) Granular synthesis texture. (c) Mix of (a) + (b).
图 26.4 开创性的 SoundEdit 应用程序(1986 年)的界面,显示三个重叠的声音。每个波形片段都可以用鼠标选择,并自由放置在时间轴上的任何位置。
Figure 26.4 Interface of the pioneering SoundEdit app (1986), showing three overlapping sounds. Each waveform clip could be selected by the mouse and freely positioned anywhere on the timeline.
图 26.5 主轨道上带有计量和频谱分析仪插件的 Pro Tools 混音屏幕图像。
Figure 26.5 Screen image of a Pro Tools mix with a metering and spectral analyzer plug-in on the master track.
图 26.6 制作人乔治·马丁(披头士乐队)于 20 世纪 60 年代初在伦敦艾比路录音室调整定制的 EMI 混音器。
Figure 26.6 Producer George Martin (The Beatles) adjusting a custom-built EMI mixer in the early 1960s at Abbey Road Studios, London.
图 26.7 AMS Neve BCM-10 mk2 模拟混音控制台 2020 年售价 70,000 美元。
Figure 26.7 AMS Neve BCM-10 mk2 analog mixing console costing $70,000 in 2020.
图 26.8 一个简单的 8/4/2 调音台的信号流,同时显示了调音台的不同部分。方块代表按钮开关,圆圈代表旋钮。01 到 04 表示输出总线,L 和 R(代表左和右)也表示输出总线。监听部分中的 CM 和 SM 指示符指的是演播室监听和控制室监听电平的控制。
Figure 26.8 Signal flow in a simple 8/4/2 mixer, also showing the different sections of the mixer. The squares represent push-button switches, and the circles represent rotary knobs. 01 through 04 indicate output buses, as do L and R (for left and right). The indicators CM and SM in the monitor section refer to controls for studio monitor and control room monitoring levels.
图 26.9 混频器上简单输入模块的各个阶段。表 26.1 解释了每个阶段。
Figure 26.9 Stages of a simple input module on a mixer. Table 26.1 explains each stage.
图 26.10 带有电动推子自动化的大型混合混音控制台 AMEK 9098i,由 Rupert Neve 和 Graham Langley 设计。
Figure 26.10 A large hybrid mixing console with motorized fader automation, the AMEK 9098i, designed by Rupert Neve and Graham Langley.
图 26.11 LAWO mc 2 96 大型数字调音台以 96 kHz 采样率运行,采用 40 位浮点信号处理,量化精度为 24 位。这款豪华调音台配备触摸屏控制、数十个视频显示屏以及数百个可物理访问的旋钮、按钮和推子。这与小型数字调音台截然不同,后者只能通过复杂的菜单和子菜单来访问控制。
Figure 26.11 LAWO mc2 96 large-format digital mixing console runs at 96 kHz sampling rate with 24-bit quantization using 40-bit floating-point signal processing. This luxury mixer offers touch-screen control, dozens of video displays, and hundreds of physically accessible knobs, buttons, and faders. This stands in contrast to a small digital mixer on which the controls can be accessed only through a labyrinth of menus and submenus.
图 26.12 在可分配调音台中,每个输入通道都有一个推子,但调音台只有一组用于均衡、动态、输出总线分配等控制的控件。通过触摸相应推子上方的分配按钮(标记为 A),即可访问任何通道上的控件。这会将控制切换到该通道。在此图中,通道 2 访问两个参数均衡单元和一个动态范围扩展器,并将其输出路由到多个输出总线。无限旋转的旋钮是理想的可分配控件。
Figure 26.12 In an assignable console, each input channel has a fader, but the console has just one set of controls for equalization, dynamics, output bus assignment, and so forth. Access to a control on any channel is obtained by touching an assign button (marked A) above the relevant fader. This switches control to that channel. In this figure, channel 2 accesses two parametric equalization units and a dynamic range expander and routes its output to several output buses. Endless-turn rotary knobs are ideal assignable controls.
图 26.13 1964 年的 Studer J37 四轨磁带录音机。
Figure 26.13 Studer J37 four-track tape recorder from 1964.
图 26.14 三个立体声监听环境。(a)近场。(b)控制室。(c)听音室。
Figure 26.14 Three stereo monitoring environments. (a) Near-field. (b) Control room. (c) Listening room.
图 27.1 重塑羽管键琴音调的振幅包络。(a)原始音调。(b)手绘的新包络。(c)重塑后的羽管键琴音调,遵循新包络的轮廓。
Figure 27.1 Reshaping the amplitude envelope of a harpsichord tone. (a) Original tone. (b) New envelope drawn by hand. (c) Reshaped harpsichord tone that follows the outline of the new envelope.
图 27.2 噪声门的工作原理。(a)如果没有噪声门,包含低电平噪声的音乐信号会逐渐衰减为噪声。(b)如果使用噪声门,衰减信号会超过噪声门阈值,从而启用噪声门。因此,信号会逐渐衰减为静音,而不是信号和噪声的混合。
Figure 27.2 Operation of a noise gate. (a) Without a noise gate, a musical signal that contains low-level noise fades to noise. (b) With a noise gate, the fading signal crosses the noise gate threshold so that the noise gate switches in. Hence the signal fades to silence instead of to a mixture of signal and noise.
图 27.3 动态范围处理。左列显示了与各种处理方法相关的传递函数。(a)原始信号——铙钹撞击声——具有线性传递函数。(b)峰值的软压缩使其降低了几个分贝。(c)硬限制使峰值平坦化,使其保持在 T 指示的阈值边界内。(d)扩展使峰值夸大,从而产生几个新的峰值。
Figure 27.3 Dynamic range processing. The left column shows the transfer functions associated with the various processing methods. (a) Original signal—a cymbal crash—with a linear transfer function. (b) Soft compression of peaks scales them down several decibels. (c) Hard limiting flattens peaks to keep them within the threshold boundaries indicated by T. (d) Expansion exaggerates peaks, creating several new ones.
图 27.4 软膝压缩与硬膝压缩。
Figure 27.4 Soft versus hard knee compression.
图 27.5 降噪单元在录制时压缩,在播放时扩展。
Figure 27.5 Noise reduction units compress on recording and expand on playback.
图 27.6 压扩降噪单元降低了进入噪声信道的宽动态范围。它试图将信号保持在噪声水平之上,并保持在削波水平之下。压扩器的最后一级扩展了动态范围。
Figure 27.6 Companding noise reduction unit reduces the wide dynamic range going into the noisy channel. It tries to keep the signal above the noise level and below the clipping level. The final stage of the compander expands the dynamic range.
图 27.7 带有侧链输入的动态范围处理器。
Figure 27.7 Dynamic range processor with sidechain input.
图 27.8 两个波形的叠加。中间的条形图显示的是音量被调低的压缩版本,图像的上下部分显示的是原始瞬态,这些瞬态已被压缩压平并消除。中间条形图的压缩版本缺乏动态范围变化(或“冲击力”)。
Figure 27.8 The superposition of two waveforms. The center bar shows a compressed version that has been turned down in volume, and the upper and lower parts of the image show the original transients that have been flattened and eliminated by compression. The compressed version in the center bar lacks dynamic range variation (or “punch”).
图 28.1 过滤器是一个黑匣子,具有输入x和输出y。
Figure 28.1 A filter as a black box with an input x and an output y.
图 28.2 两个 LTI 滤波器(A 和 B)可以按顺序交换,得到的y将相同。
Figure 28.2 Two LTI filters (A and B) can be swapped in sequence and the resulting y will be the same.
图 28.3 LTI 滤波器仅由三个简单组件组成:加法器( +或−)、标量乘法器(三角形)和延迟器(D)。
Figure 28.3 LTI filters consist of only three simple components: adders (+ or −), scalar multipliers (triangle), and delays (D).
图 28.4 LTI 滤波器对幅度和相位的影响。输入信号幅度较大,输出信号幅度较小的正弦波。注意箭头所示的第一个峰值之间的延迟。
Figure 28.4 LTI filter effect on amplitude and phase. The input signal is the larger amplitude, and the output is the sinusoid with smaller amplitude. Notice the delay between the first peaks, indicated by the arrows.
图 28.5 两个一阶滤波器。(a)有限脉冲响应 (FIR) 前馈滤波器。(b)无限脉冲响应 (IIR) 反馈滤波器。
Figure 28.5 Two first-order filters. (a) Finite-impulse response (FIR) feed-forward filter. (b) Infinite-impulse response (IIR) feedback filter.
图 28.6 LTI 滤波器作为抽头延迟线的一般形式。
Figure 28.6 General form of an LTI filter as a tapped delay line.
图 28.7 模拟房间混响的滤波器的脉冲响应。
Figure 28.7 Impulse response of a filter that simulates room reverberation.
图 28.8 三个信号的频谱。(a)以高频为中心的频带中能量较大。(b)滤波器降低了(a)中峰值。(c)用(b)对(a)进行滤波后得到的频谱。
Figure 28.8 The spectra of three signals. (a) Large amount of energy in a band centered around a high frequency. (b) Filter that reduces the peak in (a). (c) Resulting spectrum after filtering (a) by (b).
图 28.9 低通滤波器的幅度和相位响应。
Figure 28.9 Magnitude and phase response of a lowpass filter.
图 28.10 数字滤波器的极点和零点。(a)极点在频率响应中产生峰值。(b)零点产生谷值。
Figure 28.10 Pole and zero of a digital filter. (a) A pole creates a peak in the frequency response. (b) A zero creates a trough.
图 28.11 简单低通滤波器的两个等效图。
Figure 28.11 Two equivalent diagrams of a simple lowpass filter.
图 28.12 图 28.11 所示的简单平均低通滤波器的幅度响应。
Figure 28.12 Magnitude response of the simple averaging lowpass filter shown in figure 28.11.
图 28.13 对两个样本进行平均的低通滤波器。它比图 28.12 中的简单滤波器具有更陡峭的滚降。
Figure 28.13 Lowpass filter that averages two samples. It has a sharper rolloff than the simple filter in figure 28.12.
图 28.14 对四个样本进行平均的低通滤波器。
Figure 28.14 Lowpass filter that averages four samples.
图 28.15 简单高通滤波器响应。
Figure 28.15 Simple highpass filter response.
图 28.16 双差分高通滤波器的响应。
Figure 28.16 Response of a double difference highpass filter.
图 28.17 带通滤波器响应。
Figure 28.17 Bandpass filter response.
图 28.18 带反馈的指数平滑滤波器。
Figure 28.18 Exponential smoothing filter with feedback.
图 28.19 指数平滑滤波器对乘数a的六个不同值的响应。
Figure 28.19 Response of an exponential smoothing filter for six different values of the multiplier a.
图 28.20 三个不同阶数的 FIR 梳状滤波器的响应:4、8 和 11 个样本。
Figure 28.20 Response of a FIR comb filter for three different orders: 4, 8, and 11 samples.
图 28.21 减法 FIR 梳状滤波器的响应。
Figure 28.21 Response of a subtracting FIR comb filter.
图 28.22 IIR 梳状滤波器对三种不同延迟的响应:4、8 和 11 个样本。
Figure 28.22 IIR comb filter response for three different delays: 4, 8, and 11 samples.
图 28.23 3、5 和 7 阶全通滤波器响应。
Figure 28.23 Allpass filter response for orders 3, 5, and 7.
图 28.24 全通滤波器。
Figure 28.24 Allpass filter.
图 28.25 低通滤波器的通带、阻带和过渡带的示例规范。
Figure 28.25 Example specifications for the passband, the stopband, and the transition bands of a lowpass filter.
图 28.26 按照相同规格设计的 FIR(灰色)和 IIR(黑色)滤波器的幅度和相位响应比较。
Figure 28.26 Comparison of the magnitude and phase response of an FIR (gray) and an IIR (black) filter designed to the same specifications.
图 28.27 二阶部分或双二阶滤波器。
Figure 28.27 Second-order section, or biquad, filter.
图 29.1 声音的时域与频域表示。(a) 一段持续时间约 50 毫秒的短声音的时域波形。该片段已通过包络(或窗口)缩放,使其淡入淡出。(b) (a) 中声音的频域频谱,显示了从 0 Hz 到奈奎斯特频率的每个频率的电平或幅度。
Figure 29.1 Time-domain versus frequency-domain representations of sound. (a) A time-domain waveform of a brief segment of sound lasting around 50 ms. This segment has been scaled by an envelope (or window) so that it fades in and fades out. (b) The frequency-domain spectrum of the sound in (a), showing the level or magnitude of each frequency from 0 Hz to the Nyquist frequency.
图 29.2 样本卷积的典型示例。(a)输入信号与单位脉冲的卷积是一个恒等运算。(b)与缩放值为 0.5 的单位脉冲进行卷积,将输入缩放 0.5。(c)与延迟或时移单位脉冲进行卷积,将相应地对输入序列进行时移。
Figure 29.2 Prototypical examples of sample convolution. (a) Convolution of an input signal with the unit impulse is an identity operation. (b) Convolution with a scaled unit impulse of value 0.5 scales the input by 0.5. (c) Convolution with a delayed or time-shifted unit impulse time-shifts the input sequence correspondingly.
图 29.3 卷积的时域效应。(a)两个间隔较远的脉冲进行卷积会产生回声效应。(b)两个间隔较近的脉冲进行卷积会产生时间涂抹效应。
Figure 29.3 Time-domain effects of convolution. (a) Convolution with two impulses spaced widely apart produces an echo effect. (b) Convolution with two impulses close together produces a time-smearing effect.
图 29.4 声音x与一个持续n 个样本的 IR 进行卷积。对于 IR 中的每个样本,我们将其替换为 x 的缩放和延迟副本。
Figure 29.4 Sound x is convolved with an IR lasting n samples. For each sample in the IR, we replace it with a scaled and delayed copy of x.
图 29.5 快速卷积方案。
Figure 29.5 Fast convolution scheme.
图 29.6 反卷积。顶部是原始卷积信号。中间是除数。底部是反卷积信号。
Figure 29.6 Deconvolution. At the top, the original convolved signal. In the center, the divisor. At the bottom, the deconvolved signal.
图 29.7 正弦扫频法测量脉冲响应。(a)正弦扫频的声谱图,以频率与时间的关系绘制。持续时间为 5 秒。(b)混响大厅中正弦扫频的声谱图。持续时间为 8.8 秒。(c)将扫频正弦波从信号中反卷积出来后,大厅脉冲响应的时域图。持续时间为 3.8 秒。图片由 Shashank Aswathanarayana 博士提供。
Figure 29.7 Sine-sweep method of measuring the impulse response. (a) Sonogram of a sine sweep plotted as a function of frequency versus time. The duration is 5 seconds. (b) Sonogram of a sine sweep in a reverberant hall. The duration is 8.8 seconds. (c) Time-domain plot of the impulse response of the hall after deconvolving the swept sine out of the signal. The duration is 3.8 seconds. Images courtesy Dr. Shashank Aswathanarayana.
图 29.8 时间拖尾示例。(a)原始来源,牛铃敲击,音调尖锐。(b)牛铃与其自身卷积的结果。注意音调中的时间拖尾。
Figure 29.8 Example of time-smearing. (a) Original source, a cowbell strike with a sharp attack. (b) Result of convolution of the cowbell with itself. Notice the time-smearing in the attack.
图 29.9 环形调制的卷积。这些图像显示了 FFT 内部频谱的表示,其中采用了对称表示。(a) 100 Hz 的正弦波。(b) 1 KHz 的正弦波。(c) (a) 和 (b) 的卷积。
Figure 29.9 Ring modulation as convolution. These images show the representation of spectra inside the FFT, where a symmetrical representation applies. (a) Sinusoid at 100 Hz. (b) Sinusoid at 1 KHz. (c) Convolution of (a) and (b).
图 29.10 粒子卷积。(a)由短暂粒子组成的稀疏云状结构,每个粒子持续 0.5 毫秒。(b)铃鼓敲击。(c)将 (a) 和 (b) 进行卷积,可得到多个铃鼓敲击,与云状结构的时间模式相对应。注意 (a) 中第二个粒子引起的能量向负方向的瞬时偏移。
Figure 29.10 Convolution with grains. (a) Sparse cloud of brief grains lasting 0.5 ms each. (b) Tambourine hit. (c) Convolution of (a) and (b) results in multiple tambourine hits, corresponding to the temporal pattern of the cloud. Notice the momentary shift to negative energy caused by the second grain in (a).
图 30.1 数字延迟线电路。请注意该结构与图 28.6 中的滤波器结构的相似性。
Figure 30.1 Circuit of a digital delay line. Notice the similarity between this structure and the filter structures in figure 28.6.
图 30.2 使用循环队列实现延迟线的操作。N表示最新的样本。箭头表示该样本已被写入队列。O表示最旧的样本。箭头表示该样本已被从队列中读取。(a)“之前”显示t时刻循环队列中的样本。(b)“之后”显示t + 1时刻队列中的样本,表示t时刻最旧的样本所占用的空间已被读出并被新传入的样本替换。
Figure 30.2 Operation of a circular queue to implement a delay line. N is the newest sample. The arrow shows that it is written to the queue. O is the oldest sample. The arrow shows that it is read from the queue. (a) “Before” shows samples in a circular queue at time t. (b) “After” shows samples in the queue at time t + 1, indicating that the space held by the oldest sample at time t has been read out and replaced by a new incoming sample.
图 30.3 一个以循环队列实现的双抽头延迟线。两个读取抽头(抽头1 和抽头2)沿着指针O(旧)和N (新)在队列中循环。每个采样周期,输入的样本都会被写入N所占据的位置
Figure 30.3 A two-tap delay line implemented as a circular queue. The two read taps, Tap1 and Tap2, circulate around the queue along with pointers O (old) and N (new). Incoming samples are written into the position occupied by N at each sample period.
图30.4 直达声与反射声混合引起的回声效应。
Figure 30.4 Echo effect caused by mixture of direct sound with reflected sound.
图 30.5 用于重复回声的反馈电路。
Figure 30.5 A feedback circuit for repeating echoes.
图 30.6具有回波整形功能 的滤波器的脉冲响应,其中重复次数在规定数量的回波后突然截断(Putnam 2015 年)。振幅为纵轴,时间(回波数)为横轴。(a)余弦红外。(b)余弦衰减。(c)线性膨胀的近似值。
Figure 30.6 Impulse responses of filters with echo shaping, in which the repetition is suddenly truncated after a stipulated number of echoes (after Putnam 2015). Amplitude is the vertical axis and time (as echo number) is the horizontal axis. (a) Cosine IR. (b) Cosine decay. (c) Approximation of a linear swell.
图 30.7 使用两台模拟磁带录音机进行磁带镶边。第二台磁带录音机的播放速度会随着操作员用手指按压磁带卷轴边缘而变化。
Figure 30.7 Tape flanging using two analog tape recorders. The playback speed of the second tape recorder varies as an operator applies finger pressure to the flange of the reel.
图 30.8 带反馈的镶边电路,将延迟信号与原始信号混合。低频振荡器 (LFO) 提供围绕中心延迟时间的延迟时间变化。通过在延迟反馈路径和原始信号路径上插入乘法器,可以实现更复杂的电路,从而可以调整两个信号之间的比率或反转反馈的相位。
Figure 30.8 Flanger circuit with feedback, mixing a delayed signal with an original signal. A low-frequency oscillator (LFO) supplies the variation in the delay time around a center delay time. This circuit could be made more sophisticated by inserting multipliers on the delay feedback path and the original signal path, so that one could adjust the ratio between the two signals or invert the phase of the feedback.
图 31.1 Celemony 的 Melodyne 音高编辑器截图。左栏以钢琴卷帘窗的形式显示音符。顶行显示乐谱。中间部分显示音符点。音符点上叠加了音高漂移线,显示了滑音、颤音和其他音高偏差。
Figure 31.1 Screenshot of the Melodyne pitch editor by Celemony. Musical notes are indicated in a piano-roll style in the left column. The top row displays music notation. The center portion shows note blobs. Superimposed on the blobs are pitch drift lines showing portamento, vibrato, and other pitch deviations.
图 31.2 时间粒化。(a)通过提取分离的粒子并将其组合成更短的波形来缩短时间。(b)通过克隆每个粒子的两个副本来延长时间。在这两种情况下,信号的局部频率内容都得以保留。
Figure 31.2 Time granulation. (a) Time shrinking by extracting separated grains and combining them into a shorter waveform. (b) Time expansion by cloning two copies of each grain. In both cases the local frequency content of the signal is preserved.
图 31.3 当两个粒子任意拼接时,一个粒子的末端可能与下一个粒子的起点不匹配。这会导致拼接点出现瞬态(咔嗒声)。
Figure 31.3 When two grains are arbitrarily spliced, the end of one grain may not match the beginning of the next grain. This can cause a transient (a click) at the splice point.
图 31.4 粒度时间拉伸和收缩。(a)原始语音波形。(b)将(a)中的波形切割成从 A 到 H 的颗粒段。(c)将时间拉伸两倍,重复(b)中的每个颗粒。(d)时间收缩会丢弃(b)中的所有其他颗粒。(e)将(a)的波形音高提高一个八度。(f)对(e)进行粒度化。(g)将时间拉伸两倍,恢复了音高改变后语音的原始时长。请注意,在实际应用中,波形通常不会像图中所示那样用矩形包络切割,而是用平滑的钟形包络切割。
Figure 31.4 Granular time stretching and shrinking. (a) Original speech waveform. (b) Waveform in (a) cut into grain segments A through H. (c) Time stretching by a factor of two repeats every grain in (b). (d) Time shrinking throws away every other grain in (b). (e) Waveform (a) pitch-shifted up by an octave. (f) Granulation of (e). (g) Time stretching by a factor of two restores the original duration of the pitch-shifted speech. Note that in practice the waveform is not usually cut with a rectangular envelope as shown here but rather with a smooth bell-shaped envelope.
图 31.5 (a) 的频谱音高向上移动到 (b),同时保持共振峰峰值(箭头所示)。
Figure 31.5 The spectrum of (a) is pitch shifted up to (b) while maintaining the formant peaks (indicated by the arrows).
图 31.6 简单的音高变换。音高升高,时值缩短,反之亦然。McMillen (2015) 作品。
Figure 31.6 Simple pitch shifting. As the pitch increases the duration shrinks, and vice versa. After McMillen (2015).
图 31.7 精细的音高-时值变化。在本例中,音高上移(b)或下移(c),同时保持原始时值。McMillen (2015) 版本。
Figure 31.7 Granular pitch-time changing. In this case, the pitch is shifted up (b) and down (c) while maintaining the original duration. After McMillen (2015).
图 31.8 跟踪相位声码器包络的时间尺度修改。图中纵轴表示幅度,横轴表示时间。(a)原始图。(b)时间拉伸图。(c)时间收缩图。
Figure 31.8 Time-scale modification of tracking phase vocoder envelopes. The plots show amplitude on the vertical axis and time on the horizontal axis. (a) Original. (b) Stretched in time. (c) Shrunk in time.
图 31.9 SPEAR 允许用户选择一组轨迹并输入时间尺度因子。轨迹会被拉伸或收缩到新的时间尺度。
Figure 31.9 SPEAR lets a user select a set of tracks and type in a time-scale factor. The tracks are stretched or shrunk to the new time scale.
图 31.10 SCATTER 应用程序减慢播放速度,同时保持音调不变。
Figure 31.10 The SCATTER app slowing down a playback while preserving pitch.
图 32.1 德国卡尔斯鲁厄媒体技术中心(ZKM)赫兹实验室的“Klangdom”(约 2010 年)。观众席上方安装了 48 个 Genelec 扬声器。地面上则安装了两个低音炮,用于打造深沉的低音效果。
Figure 32.1 The Klangdom at the Hertz-Lab in the Center for Media Technology (ZKM) in Karlsruhe, Germany (c. 2010). Above the audience are 48 Genelec loudspeakers. On the ground are two subwoofers for deep bass.
图 32.2 Acousmonium,一款由 Groupe de Recherches Musicales (GRM) 设计的多通道空间化装置,于 1980 年安装在巴黎法国广播电台的奥利维尔·梅西安音乐厅。Acousmonium 通过 48 通道混音器将声音投射到 80 个扬声器上,其声像的复杂程度堪比管弦乐队。它允许作曲家为 Acousmonium 的空间表演重新编排电子乐曲。(摄影:Laszlo Ruszka,由 François Bayle 和 Groupe de Recherches Musicales 提供。)
Figure 32.2 The Acousmonium, a multichannel spatializer designed by the Groupe de Recherches Musicales (GRM), installed in Olivier Messiaen concert hall, Maison de Radio France, Paris, in 1980. Projecting sound over eighty loudspeakers played through a 48-channel mixer, the Acousmonium achieves a complexity of sound image rivaling that of an orchestra. It lets a composer reorchestrate an electronic composition for Acousmonium spatial performance. (Photograph by Laszlo Ruszka and supplied courtesy of François Bayle and the Groupe de Recherches Musicales.)
图32.3 爱德华·科布林的HYBRID IV录音棚,建于1977年柏林,配备电脑控制的16声道空间化系统。扬声器安装在墙壁上。
Figure 32.3 Edward Kobrin’s HYBRID IV studio set up in Berlin, 1977, featuring a computer-controlled sixteen-channel spatialization system. The loudspeakers are mounted on the walls.
图 32.4 使用少量扬声器实现电子音乐和计算机音乐空间化的配置设置。(a)基本立体声,LF =左前,RF =右前。(b)四声道立体声,RR =右后,LR =左后。(c)四声道环绕声。右前和左后扬声器安装在耳朵上方,因此当声音水平平移时,它也会垂直平移。(d)五扬声器配置,垂直扬声器向下投射。如果天花板相对较低且反射性强,可以将扬声器放在向上投射的支架上,这样声音就会从上方反射出来。
Figure 32.4 Setups for configurations for spatialization of electronic and computer music with a small number of loudspeakers. (a) Basic stereo, LF = left front, RF = right front. (b) Quadraphonic, RR = right rear, LR = left rear. (c) Quadraphonic periphony. The right front and left rear loudspeakers are mounted above ear level, so that when sound pans horizontally it also pans vertically. (d) Five-speaker configuration with vertical loudspeaker projecting downward. If the ceiling is relatively low and reflective, the loudspeaker can be on a stand projected upward so that the sound will be heard reflected from above.
图 32.5 细心的聆听者可以根据水平角度、高度和距离等线索来定位声源。L =扬声器。
Figure 32.5 The attentive listener can localize a sound source from cues of its horizontal angle, height, and distance. L = loudspeaker.
图 32.6 要将声源定位在两个扬声器 A 和 B 之间的P点,需要确定声源与 A 和 B 之间中点之间的夹角θ 。中间θ等于 0 °。θ max是最大角度,通常为正负 45 °。
Figure 32.6 To position a sound source at a point P between the two loudspeakers A and B, ascertain the angle θ of the source measured from the midpoint between A and B. In the middle θ equals 0°. The angle θmax is the maximum angle, typically plus or minus 45°.
图 32.7 由于强度减弱,线性声像曲线在中间位置呈现后退状态。顶部显示各通道的振幅曲线,底部显示感知轨迹。
Figure 32.7 A linear panning curve is perceived as receding in the middle due to a diminution of intensity. The amplitude curves for each channel are shown at the top; the perceived trajectory is shown at the bottom.
图 32.8 恒功率声像曲线保持感知距离和强度在中间。各通道的振幅曲线显示在顶部;感知轨迹显示在底部。
Figure 32.8 A constant-power panning curve maintains the perceived distance and intensity in the middle. The amplitude curves for each channel are shown at the top; the perceived trajectory is shown at the bottom.
图 32.9 分离三个空间分量 L =左声道,R =右声道,M =单声道。(0)原始立体声信号 LMR 可以看作是 LM 与 MR 之和。(1)要提取 L 和 R 信号,请从 LM 中减去 MR。(2)要提取 M,请从 LMR 中减去 LR。现在我们可以独立地操纵 L、M 和 R 来改变立体声像。
Figure 32.9 Separating the three spatial components L = left, R = right, and M = mono. (0) The original stereo signal LMR can be considered as a sum of LM and MR. (1) to extract the L and R signals, subtract MR from LM. (2) To extract M, subtract LR from LMR. Now we can manipulate L, M, and R independently to vary the stereo image.
图 32.10 用于模拟远离听众的声音的声级指示器。D =直达声;R =混响声。(a)近距离声音,其中直达声的振幅远高于混响声。(b)远距离声音。整体振幅较低,直达声与混响声的比率缩小。
Figure 32.10 Level indicators for simulating a sound that moves away from the listener. D = direct; R = reverberated. (a) Close sound in which the direct sound is much higher in amplitude than the reverberated sound. (b) Distant sound. The overall amplitude is lower, and the ratio of the direct to the reverberated sound has narrowed.
图 32.11 朝向听者的声音具有正(P)径向速度。远离听者的声音具有负(N)径向速度。(b)沿圆周运动的声音与听者的距离始终相同,因此径向速度为零。
Figure 32.11 A sound moving toward the listener has positive (P) radial velocity. Sound moving away has negative (N) radial velocity. (b) Sound moving in a circle is always the same distance away from the listener and so has zero radial velocity.
图 32.12 多普勒频移波前图案。(a)静态声音,波前以恒定的间隔到达,因此音调没有变化。(b)S1、S2 和 S3 表示移动声源的连续位置。音调向上偏移。
Figure 32.12 Doppler shift wavefront patterns. (a) Static sound, wavefronts arrive at constant intervals so there is no pitch change. (b) S1, S2, and S3 represent successive positions of a moving sound source. Upward pitch shift.
图 32.13 多普勒频移波前。在不同高度以 90 °角(直入左耳)听到的声音的HRTF 频谱(幅度x与频率y的关系)。(a) 高于耳高 15 °。(b) 耳高。(c) 低于耳高。(Rodgers 1981 年;由音频工程协会友情提供。)
Figure 32.13 Doppler shift wave fronts. HRTF spectra (amplitude x versus frequency y) for sounds heard at 90° (straight into left ear) at various altitudes. (a) 15° above ear level. (b) Ear level. (c) Below ear level. (After Rodgers 1981; published courtesy of the Audio Engineering Society.)
图 32.14 两个不同人的 HRTF 频谱。左耳,声源与耳高齐平。频率图从 1 到 18 KHz。垂直线表示 8 KHz 的标记。8 KHz 以上,两个 HRTF 之间的差异非常显著。水平线表示 20 dB 的差异。
Figure 32.14 HRTF spectra for two different persons. Left ear, with source at ear level. The frequency plot goes from 1 to 18 KHz. The vertical line indicates the 8 KHz mark. The differences between the two HRTFs above 8 KHz are striking. The horizontal lines indicate 20 dB differences.
图 32.15 Leslie 旋转扬声器专利,1949 年。
Figure 32.15 Leslie rotating loudspeaker patent, 1949.
图 32.16 1959 年在 Gravesano 实验工作室建造的旋转球形扬声器。
Figure 32.16 Rotating spherical loudspeaker constructed in 1959 at the Experimental Studio Gravesano.
图 32.17 1960 年卡尔海因茨·施托克豪森 (Karlheinz Stockhausen) 及其旋转扬声器装置。四个麦克风围绕着扬声器转盘放置,转盘由手动操作。后期版本则由电动装置控制。(图片版权归科隆 WDR 所有。)
Figure 32.17 Karlheinz Stockhausen with rotating loudspeaker mechanism in 1960. Four microphones are positioned around the loudspeaker turntable, which was manipulated by hand. A later version was controlled by a motorized mechanism. (Photograph copyright WDR, Cologne.)
图 32.18 扬声器的传统扩散模式很宽,而超指向性声束很窄。
Figure 32.18 The conventional dispersion pattern of a loudspeaker is broad, whereas a superdirectional sound beam is narrow.
图 32.19 Yamaha YSP-100 从四十二个扬声器发出声音,营造环绕声效果。
Figure 32.19 The Yamaha YSP-100 projects sounds from forty-two loudspeakers to create surround sound effects.
图 32.20 2016 年 CREATE 乐团在加州大学圣塔芭芭拉分校 AlloSphere 外部演出,该剧院采用 54.1 Meyer 音响系统。观众可以在 AlloSphere 内部看到。
Figure 32.20 A 2016 CREATE Ensemble performance on the exterior of the UCSB AlloSphere, a 54.1 Meyer Sound system. The audience can be seen inside the AlloSphere.
图 32.21 基于向量的振幅平移。三维单位向量 I 1、I 2和 I 3分别定义了扬声器 1、2 和 3 的方向。虚拟声源 p 是增益因子 I 1、I 2和 I 3的线性组合。使用这三个扬声器,可以在所示有源三角形内的任何位置创建虚拟声源。这可以推广到任意多个扬声器的空间配置。Pulkki (1997) 提出。
Figure 32.21 Vector base amplitude panning. The three-dimensional unit vectors I1, I2, and I3 define the directions of loudspeakers 1, 2, and 3. The virtual sound source p is a linear combination of the gain factors of I1, I2, and I3. Using these three loudspeakers, virtual sources can be created anywhere within the active triangle shown. This can be generalized to arbitrary spatial configurations of multiple loudspeakers. After Pulkki (1997).
图 32.22 Sennheiser AMBEO VR 麦克风。
Figure 32.22 Sennheiser AMBEO VR microphone.
图 32.23 Ambisonic B 格式配置,用于将两个单声道输入通道空间化到任意数量的扬声器。单声道主干可以由传统的 DAW 生成。标有 FX 的框表示在发送到 B 格式的 W、X、Y 和 Z 四个通道之前,用于声场操控的插件的插入点。效果可以包括声像、声场旋转、倾斜和混响等。扬声器显示为一条垂直线。实际上,它们需要围绕观众呈规则的几何形状排列。
Figure 32.23 Ambisonic B-format configuration to spatialize two mono input channels to an arbitrary number of loudspeakers. The mono stems could be generated by a traditional DAW. The boxes marked FX indicate the point at which plug-ins for sound field manipulation would be inserted before being sent to the four channels W, X, Y, and Z of the B format. The effects could include panning, sound field rotations, tilting, and reverberation, for example. The loudspeakers are shown in a vertical line. In reality they need to be in a regular geometrical configuration around the audience.
图 32.24 Ambisonic B 格式球谐函数分量(最高至三阶)的可视化表示。深色部分表示极性反转的区域。请注意前两行分别对应全向麦克风和 8 字形麦克风的指向性。
Figure 32.24 Visual representation of the Ambisonic B-format spherical harmonics components up to third order. Dark portions represent regions where the polarity is inverted. Note how the first two rows correspond to omnidirectional and figure-eight microphone polar patterns.
图 32.25 多个扬声器合成波前。(a)1678 年的惠更斯原理。(b)使用扬声器阵列进行波场合成。每个扬声器都经过放大和延迟,以接近(a)所示的效果。
Figure 32.25 Multiple loudspeakers synthesizing a wave front. (a) Huygens principle of 1678. (b) Wave field synthesis with an array of loudspeakers. Each loudspeaker is amplified and delayed to approximate (a).
图 32.26 由 558 个独立可控扬声器组成的 EMPAC 波场合成阵列。图片由伦斯勒理工学院提供。
Figure 32.26 EMPAC wave field synthesis array consisting of 558 independently controllable loudspeakers. Courtesy of Rensselaer Polytechnic Institute.
图 32.27 56 个单极扬声器合成一个沿正向传播的平面波的仿真结果(即朝向此俯视图的顶部)。此仿真中所有扬声器均处于活动状态。右侧显示了以 dB 为单位的幅度刻度。源自 Ahrens、Rabenstein 和 Spors (2014)。
Figure 32.27 Simulation of 56 monopole loudspeakers synthesizing a plane wave that propagates in a positive direction (i.e., toward the top in this overhead diagram). In this simulation all loudspeakers are active. An amplitude scale in dB is shown at the right. After Ahrens, Rabenstein, and Spors (2014).
图 33.1 混响是由空间表面对声音的反射引起的。深色线表示直达声的路径;所有其他线表示由于路径较长而比原始声更晚到达的反射声。
Figure 33.1 Reverberation is caused by reflections of sound by surfaces in a space. The dark line is the path of direct sound; all other lines represent sonic reflections that arrive later than the original because of their longer paths.
图 33.2 波士顿交响乐大厅。大厅四周的雕像和壁龛使反射效果增强。
Figure 33.2 Boston Symphony Hall. The statues and niches around the sides of the hall multiply the reflections.
图 33.3 混响大厅的脉冲响应包络。混响的成分包括预延迟(直达声到达听者之前的 25 毫秒延迟)、早期反射和融合的后期混响。请注意,这些成分之间的分离是理想化的。
Figure 33.3 Impulse response envelope of a reverberant hall. The components of reverberation are shown as the predelay (a 25 ms delay before the direct sound reaches the listener), the early reflections, and the fused late reverberation. Note that the separation between these components is idealized.
图 33.4混响时间或 RT60 是测量混响衰减至峰值水平 -60 dB 的点 。
Figure 33.4 The reverberation time or RT60 is measured as the point at which the reverberation decays to −60 dB of its peak level.
图 33.5 为了创造声学氛围效果,可以通过扬声器将声音送入回音室。反射的间接声音由房间另一端的 RT60 麦克风拾取。理想情况下,房间的形状是不规则的。为了最大化和随机化反射,房间应安装声音扩散板,其中包含许多间隔不同距离的凹槽。当声波撞击这些面板时,它们会以不同的延迟时间反射,具体取决于它们撞击的凹槽。这种扩散效应往往会消除由平行墙壁引起的驻波(房间内的共振频率)。增加反射的另一种方法是安装由抛光大理石制成的反射声柱,将声音散射到各个方向。
Figure 33.5 To create an acoustic ambience effect, sound can be fed into an echo chamber via a loudspeaker. The reflected, indirect sound is picked up by a microphone at RT60 from the other end of the room. Ideally, the room is irregularly shaped. To maximize and randomize the reflections, the room should be fitted with sound diffuser panels, which contain many recesses spaced at different distances. As sound waves strike them, they are reflected at different delay times, depending on which recess they hit. This diffusion effect tends to eliminate standing waves (resonant frequencies in the room) caused by parallel walls. Another way to increase reflections is to install reflective sound pillars made of polished marble that will scatter the sound in all directions.
图 33.6 用于混响的递归梳状滤波器。(a)梳状滤波器电路,其系数为D(延迟样本数)和g(反馈量)。(b)脉冲响应,以一系列回声表示。
Figure 33.6 A recursive comb filter for reverberation. (a) Circuit of comb filter with coefficients D (number of samples to delay) and g (amount of feedback). (b) Impulse response, as a series of echoes.
图 33.7 一阶全通网络。(a)通过在延迟输出端添加− g倍的输入,梳状滤波器变为全通滤波器。(b)全通滤波器的脉冲响应是一系列呈指数衰减的回波脉冲。这使得脉冲滤波器可用作混响器的构建模块。
Figure 33.7 A first-order allpass network. (a) By adding −g times the input into the output of the delay, a comb filter is changed into an allpass filter. (b) The impulse response of an allpass filter is an exponentially decaying series of echo pulses. This makes the impulse filter useful as a building block of reverberators.
图 33.8 施罗德最初的混响器设计。(a)并联梳状滤波器馈入两个全通滤波器级。(b)四个串联的全通滤波器级。
Figure 33.8 Schroeder’s original reverberator designs. (a) Parallel comb filters fed into two allpass filter stages. (b) Four allpass filter stages in series.
图 33.9 在施罗德的后期设计中,多抽头延迟线模拟了音乐厅中声音的早期反射。
Figure 33.9 In Schroeder’s later designs, a multitap delay line simulated the early reflections of sound in a concert hall.
图 33.10 振荡全通单元混响器的脉冲响应。
Figure 33.10 The impulse response of an oscillatory allpass unit reverberator.
图 33.11 Kendall、Martens 和 Decker (1989) 提出的空间混响器 简化图。该系统通过对M 个局部混响器的贡献进行求和来模拟空间,最终产生N 个输出通道。F 是一个滤波器,它会施加由距离和空气吸收引起的频谱变化。R 是一个局部混响流,用于模拟整个房间子空间中的混响。D 是一个定向器,它会根据声音在虚拟空间中的位置对其进行滤波。该系统有两个独立的反射处理器,并在混响流中存在一些交叉馈电。
Figure 33.11 Simplified view of a spatial reverberator after Kendall, Martens, and Decker (1989). This system models a space by summing the contributions of M local reverberators, which ultimately generate N output channels. F is a filter that imposes spectrum changes caused by distance and air absorption. R is a local reverberant stream, modeling the reverberation in a subspace of the total room. D is a directionalizer that filters the sound according to its position in the virtual space. The implemented system has two independent reflection processors and some cross-feeding in the reverberant streams.
图 33.12 AudioEase Altiverb 卷积混响器的界面,其中选择了阿姆斯特丹音乐厅的 IR。
Figure 33.12 Interface of the AudioEase Altiverb convolving reverberator, with the IR of the Amsterdam Concertgebouw selected.
图 33.13天鹅绒噪声序列由零值样本和+ 1 和− 1 组成。
Figure 33.13 A velvet noise sequence consists of zero-valued samples interspersed with +1 and −1.
图 33.14 房间声射线追踪模型。声源 (A) 和接收器 (B)。根据 Savioja 和 Svensson (2015) 的研究。
Figure 33.14 Acoustic ray-tracing model of room. Source (A) and receiver (B). After Savioja and Svensson (2015).
图 33.15 具有六个节点的三端口波导网络。该波导将能量从输出端传播出去,这意味着它是一个开放的网络,最终会损失能量,就像混响厅一样。
Figure 33.15 A three-port waveguide network with six nodes. This waveguide propagates energy out of the outputs, meaning that it is an open network that eventually loses energy, as a reverberant hall does.
图 33.16 三阶 FDN(源自 Jot 和 Chaigne 1991 和 Smith 2010)。顶部矩形是三个梳状滤波器。它们的输出信号通过底部所示的反馈矩阵反馈。一个附加滤波器E ( z ) 应用于非直接信号。Jot 和 Chaigne (1991) 将此滤波器称为纯音校正器。它的作用是均衡每个频带中的能量,而不受混响时间的影响。换句话说,如果用户调整某个频带的衰减时间, E ( z ) 将相应地改变该频带的增益,因此该频带脉冲响应中的总能量保持不变。
Figure 33.16 A third-order FDN (after Jot and Chaigne 1991 and Smith 2010). The top rectangles are three comb filters. Their output signal is fed back through a feedback matrix shown at the bottom. An additional filter E(z) is applied to the nondirect signal. This filter is called a tone corrector by Jot and Chaigne (1991). It serves to equalize the energy regardless of the reverberation time in each band. In other words, if the user adjusts the decay time in a band, E(z) will impose a corresponding alteration in gain in that band so the total energy in the band’s impulse-response is unchanged.
图 33.17 通过粒状卷积实现的混响。(a)语音输入:“Moi, Alpha Soixante。” (b)粒状脉冲响应,由 1,000 个 9 毫秒的正弦波粒组成,中心频率为 14,000 Hz,带宽为 5,000 Hz。(c)将 (a) 和 (b) 进行卷积。(d)将 (a) 和 (c) 按 5:1 的比例混合,在语音周围产生混响。
Figure 33.17 Reverberation by granular convolution. (a) Speech input: “Moi, Alpha Soixante.” (b) Granular impulse response, consisting of 1,000 9 ms sinusoidal grains centered at 14,000 Hz, with a bandwidth of 5,000 Hz. (c) Convolution of (a) and (b). (d) Mixture of (a) and (c) in a proportion of 5:1, creating reverberation around the speech.
图 34.1 用于波形分析的压力计火焰。(a)仪器。吹嘴拾取的声音会调节盒子内的本生灯火焰。当盒子旋转时,盒子外部的镜子会将火焰投射成一条连续的带状,其锯齿状边缘或齿状与输入声音的音高和频谱相对应。(b)R. Koenig 演唱的法语元音 [OU]、[O] 和 [A] 的火焰图,分别以 C1(每组底部)、G1(每组中间)和 C2(每组顶部)的音高演唱。(Tyndall 1875 年作品)
Figure 34.1 Manometric flames for waveform analysis. (a) Apparatus. Sounds picked up by the mouthpiece modulate the Bunsen burner flame within the box. When the box is rotated, mirrors on the outside of the box project the flame as a continuous band with jagged edges or teeth corresponding to the pitch and spectrum of the input sound. (b) Flame pictures of the French vowel sounds [OU], [O], and [A] by R. Koenig, sung at the pitches C1 (bottom of each group), G1 (middle of each group), and C2 (top of each group). (After Tyndall 1875.)
图 34.2 鲁道夫·柯尼希 (Rudolf Koenig) 发明的用于记录声音波形图像的留声机。 (a) 留声机机制。 (b) 留声机唱片。
Figure 34.2 Rudolf Koenig’s version of the Phonautograph for recording images of sound waveforms. (a) Phonautograph mechanism. (b) Phonoautograph records.
图 34.3 一位印度歌手两秒钟的旋律轨迹,类似于旋律图。时间水平移动。(a)基音音高轨迹。(b)振幅轨迹。(Gjerdingen 1988 年作品。)
Figure 34.3 Melodic trace like that of a Melograph for two seconds of an Indian singer. Time moves horizontally. (a) Fundamental pitch trace. (b) Amplitude trace. (After Gjerdingen 1988.)
图 34.4 Melodyne Assistant 的屏幕截图。每个斑点代表一个音符。音符中心的波浪线是详细的音高曲线。如果需要,可以拉直这些曲线,并将音符重新定位到另一个音高。
Figure 34.4 Screenshot of Melodyne Assistant. Each blob is a note. The squiggly lines in the center of the note is the detailed pitch curve. If desired, these can be straightened out and the notes can be repositioned to another pitch.
图 34.5 Audionamix Trax 自动得出的人声音高曲线。
Figure 34.5 Vocal pitch curve derived automatically by Audionamix Trax.
图 34.6 零交叉音调检测器。(a)通过测量零交叉之间的间隔(标记为Ä),我们可以获得有关信号最低周期性的线索。(b)对于具有强基波的信号,只要 PD 忽略高频分量引起的快速低幅度零点变化,无论信号上是否存在高频分量,此方法都可以工作。
Figure 34.6 Zero-crossing pitch detector. (a) By measuring the interval between the zero crossings (marked Ä), we obtain a clue as to the lowest periodicity of the signal. (b) For signals with a strong fundamental, this works regardless of the presence of high-frequency components superposed on the signal, provided that the PD ignores the rapid low amplitude zero-point variations caused by the high-frequency components.
图 34.7 自相关方案。输入信号被加窗,加窗后的片段与其自身延迟一个样本、两个样本,以此类推,直到m 个样本的版本进行比较。相关性最强的版本被估计为主音高或基音音高。
Figure 34.7 Autocorrelation scheme. The input signal is windowed, and the windowed segment is compared with versions of itself delayed by one sample, two samples, and so on, up to m samples. The strongest correlation is estimated as the dominant or fundamental pitch.
图 34.8 正弦波的自相关函数本身也是一个正弦波。O 表示原始信号;D 表示延迟信号。文中解释了 [a] 至 [e] 的情况。)图底部绘制了自相关函数。
Figure 34.8 Autocorrelation of a sine wave is itself a sinusoidal wave. O indicates original signal; D indicates delayed signal. The text explains cases [a] through [e].) The autocorrelation function is plotted at the bottom.
图 34.9 周期信号的自相关函数本身就是时间的周期函数。(a)一个包含五个谐波的信号的自相关函数,其中包括一个周期为 6.7 ms 或 149 Hz(接近 D3)的基波。该自相关函数具有周期性,但其谐波幅度与输入不同。注意与基波对应的峰值。(b)一个仅包含五个、六个和七个谐波的信号的自相关函数。该自相关函数具有周期性,周期为 6.7 ms,等于波形中缺失的基波(隐含音高)。(Moorer 1975 年版)
Figure 34.9 Autocorrelation functions of periodic signals are themselves periodic functions of time. (a) Autocorrelation of a signal with five harmonics, including the fundamental with a period of 6.7 ms or 149 Hz (close to D3). The autocorrelation is periodic, but its harmonic amplitudes are different from the input. Notice the peak corresponding to the fundamental. (b) Autocorrelation of a signal with only three harmonics: the fifth, sixth, and seventh. The autocorrelation is periodic with a period of 6.7 ms equal to the missing fundamental (implied pitch) of the waveform. (After Moorer 1975.)
图 34.10 基于自适应滤波器方案的音高检测器。注意从估计值到滤波器的反馈回路。
Figure 34.10 Pitch detector based on an adaptive filter scheme. Notice the feedback loop from the estimate back to the filter.
图 34.11 JS 巴赫《帕蒂塔 III》 前八小节估计音高的频域音高跟踪图。纵轴按平均律音阶的半音划分,从 C4 到 C7。横轴表示时间。(a) 计算机合成音高。(b) 录音室录音。(c) 混响录音。(Beauchamp、Maher 和 Brown 1993 年著。)
Figure 34.11 Plots generated by frequency-domain pitch tracking of the estimated pitch of the first eight measures of Partita III by J. S. Bach. The vertical axis is divided into semitones of the equal-tempered scale, from C4 to C7. The horizontal axis is time. (a) Computer-synthesized pitches. (b) Studio recording. (c) Reverberant recording. (After Beauchamp, Maher, and Brown 1993.)
图 34.12 倒谱计算方案。
Figure 34.12 Scheme for cepstrum computation.
图 34.13 大型混响大厅录制的小号独奏音符的倒谱图。该音符频率为 396 Hz。标有星号的峰值表示信号的周期,约为 2.52 毫秒,与检测到的音高相对应。请注意,即使存在混响,倒谱峰值也清晰可见。(Moorer,1975 年)
Figure 34.13 Cepstrum plot from a note of a trumpet solo recorded in a large reverberant hall. The note is 396 Hz. The peak marked by an asterisk indicates the period of the signal, about 2.52 ms, which corresponds to the detected pitch. Notice that the cepstrum peak appears clearly even in the presence of reverberation. (After Moorer 1975.)
图 34.14 声带脉冲响应与声道脉冲响应的倒谱分离。应用对数函数可以将细的波浪线(对应于激励)与粗的起伏宽谱(对应于脉冲响应或共振)分离。
Figure 34.14 Cepstrum separation of vocal cord impulse response from the vocal tract impulse response. Applying the log function separates the thin wiggly lines (corresponding to the excitation) from the thick undulating broad spectrum (corresponding to the impulse response or resonance).
图 34.15 基于人类听觉系统模型的音高检测器示意图。
Figure 34.15 Schema of pitch detector based on a model of the human auditory system.
图 35.1 从预处理的音频信号中得出数据减少的检测函数。
Figure 35.1 Deriving a data-reduced detection function from a preprocessed audio signal.
图 35.2 时域事件检测的问题案例。(a)音符序列。(b)颤音琴踩下延音踏板演奏这些音符时产生的时域信号。
Figure 35.2 A problem case for time-domain event detection. (a) Sequence of notes. (b) Time-domain signal generated by a vibraphone playing these notes with the sustain pedal pushed down.
图 35.3 四种声源的频谱特征。从左到右:人声、鼓、贝斯和吉他。摘自 Cano 等人 (2019)。
Figure 35.3 Spectral signatures of four source sounds. Left to right: vocals, drums, bass, and guitar. From Cano et al. (2019).
图 35.4 自动转录音频。上图:音频波形。中图:音高、起始、结束、音流和响度的中级和参数表示。下图:使用音符名称、调性、节奏和乐器按乐谱时间书写的乐谱。Duan 和 Benetos (2015) 版本。
Figure 35.4 Automatic transcription from audio. Top: Audio waveform. Center: Mid-level and parametric representation of pitch, onset, offset, stream, and loudness. Bottom: Music notation using note name, key, rhythm, and instrument in score time. After Duan and Benetos (2015).
图 35.5 James A. Moorer 于 1975 年开发的自动音乐抄写员策略。
Figure 35.5 Strategy for an automatic music scribe developed by James A. Moorer in 1975.
图 35.6 原始乐谱(上)与 Moorer 系统根据原声演奏转录的乐谱(下)的对比。长音的长度被低估了,倒数第二小节缺少一个音符。然而,最明显的变化是由于吉他音调高了半音。而循规蹈矩的计算机却始终如一地忠实地记录了乐谱高半音的音调。
Figure 35.6 Comparison of the original score (top) with the transcription from acoustic performance accomplished by Moorer’s system (bottom). The lengths of longer notes are underestimated, and a note is missing in the penultimate measure. The most conspicuous change, however, is due to the fact that the guitar was mistuned a half step high. The literal-minded computer faithfully reports the score mistuned a semitone high throughout.
图 35.7 WABOT-2 是日本早稻田大学于 1985 年开发的音乐机器人,由住友公司进一步设计。
Figure 35.7 WABOT-2, a musical robot developed in 1985 at Waseda University in Japan, with further engineering by Sumitomo Corporation.
图 35.8 匹配两幅音乐图像的训练对。(a)一段乐谱片段(180 x 280 像素)。(b)一段音频片段,以对数频谱图的形式呈现,包含 92 帧和 42 个频率点。音高以黑色水平线突出显示(Müller等人,2019 年)。
Figure 35.8 A training pair matching two images of music. (a) A snippet of music notation (180 by 280 pixels). (b) An audio clip in the form of a log-frequency spectrogram representation of ninety-two frames and forty-two frequency bins. Pitches stand out as black horizontal lines (Müller et al. 2019).
图 35.9 Mont-Reynaud 的节奏跟踪器。
Figure 35.9 Mont-Reynaud’s tempo tracker.
图 35.10 量化的有害影响。(a)正确书写的音乐输入。(b)乐谱编辑器根据六十四分音符网格使用量化进行转录。
Figure 35.10 Deleterious effects of quantization. (a) Musical input written appropriately. (b) Transcription by music score editor using quantization according to a sixty-fourth-note grid.
图 35.11 节奏分组问题。(a)节奏解析器看到的音符序列。(b)对(a)的合理解释。
Figure 35.11 A rhythmic grouping problem. (a) Sequence of notes as seen by a rhythmic parser. (b) Plausible interpretation of (a).
图 36.1 静态频谱图。(a)小号音调持续部分的线谱幅度与频率关系图。每条线代表基频 309 Hz 的谐波强度。线性幅度刻度。(b)图 (a) 中小号音调的频谱以对数 (dB) 刻度绘制,将图压缩为更窄的垂直带。(c)连续形式的频谱图,显示了人声ah的共振峰轮廓。线性幅度刻度。(图表由那不勒斯费德里科二世大学物理系的 Aldo Piccialli 提供。)
Figure 36.1 Static spectrum plots. (a) Line spectrum amplitude-versus-frequency plot of the sustained portion of a trumpet tone. Each line represents the strength of a harmonic of the fundamental frequency of 309 Hz. Linear amplitude scale. (b) Spectrum of trumpet tone in (a) plotted on a logarithmic (dB) scale, which compresses the plot into a narrower vertical band. (c) Spectrum plot in a continuous form, showing the outline of the formant peaks for a vocal sound ah. Linear amplitude scale. (Plots courtesy of Aldo Piccialli, Department of Physics, Federico II University of Naples.)
图 36.2 以线性幅度刻度绘制的时变频谱。时间从前向后移动。(a)1 KHz 的正弦波。(b)长笛以 E4 音高演奏颤音。(c)三角铁,敲击一次。注意拍频,一个泛音变高,另一个泛音变低。
Figure 36.2 Time-varying spectra plotted on a linear amplitude scale. Time moves from front to back. (a) Sine wave at 1 KHz. (b) Flute playing fluttertongue at pitch E4. (c) Triangle, hit once. Notice the beating frequencies as one partial goes high while another goes low.
图 36.3 为实时瀑布显示的图像。(a)合成小号音调。最近一次在最前面。频率刻度为对数,从左到右。基频约为 1 kHz。振幅以对数 dB 刻度垂直绘制。(b)人声旋律。最近一次在最前面。低频在左侧。(图片由加州大学伯克利分校新音乐与艺术技术中心的 A. Peevers 提供。)
Figure 36.3 images from real-time waterfall displays. (a) Synthetic trumpet tone. The most recent time is at the front. The frequency scale is logarithmic, going from left to right. The fundamental frequency is approximately 1 kHz. Amplitude is plotted vertically on a logarithmic dB scale. (b) Vocal melody. The most recent time is at the front. Low frequencies are at left. (Images courtesy of A. Peevers, Center for New Music and Art Technologies, University of California, Berkeley.)
图 36.4 声谱图。(a)敲击的铜鼓,上方为时域波形。下方的声谱图纵轴为频率,横轴为时间。该声谱图使用 1,024 个输入数据点和一个汉明窗。该图的频率分辨率为 43 Hz,时间分辨率为 1 ms。分析带宽为 0 至 22 kHz,测得的动态范围为-10至-44.5 dB,以线性幅度刻度绘制。(b)绘制至 12 kHz 的语音声谱图。
Figure 36.4 Sonogram plots. (a) Struck tam-tam, time-domain waveform on top. In the sonogram below it, the vertical axis is frequency, and the horizontal axis is time. This sonogram uses 1,024 points of input data and a Hamming window. The plot has a frequency resolution of 43 Hz and a time resolution of 1 ms. The analysis bandwidth is 0 to 22 kHz, and the measured dynamic range is −10 to −44.5 dB, plotted on a linear amplitude scale. (b) Sonogram of speech plotted to 12 kHz.
图 36.5巴黎拉雪兹神父公墓 傅立叶之墓
Figure 36.5 The grave of Fourier, Père Lachaise Cemetery, Paris.
图 36.6 James Beauchamp 于 1966 年左右在伊利诺伊大学进行声音分析实验。
Figure 36.6 James Beauchamp performing sound analysis experiments at the University of Illinois around 1966.
图 36.7 外差滤波器分析。(a)输入信号(100 Hz 正弦波)与分析信号(100 Hz 正弦波)的乘积。结果完全为正,表明 100 Hz 处的能量较强。(b)输入信号(200 Hz 正弦波)与分析信号(100 Hz 正弦波)的乘积。结果为分散的正负能量,表明输入信号在 100 Hz 处没有较强的能量。
Figure 36.7 Heterodyne filter analysis. (a) Product of an input signal (a 100 Hz sine wave) with an analysis signal (a 100 Hz sine wave). The result is entirely positive, indicating strong energy at 100 Hz. (b) Product of an input signal (a 200 Hz sine wave) with an analysis signal (a 100 Hz sine wave). The result is scattered positive and negative energy, indicating no strong energy at 100 Hz in the input signal.
图 37.1 傅里叶和。底部波形是其上方四个周期正弦波的和。四个正弦波的起始相位各不相同。
Figure 37.1 Fourier sum. The bottom waveform is the sum of the four periodic sinusoids above it. Each of the four sinusoids has a different starting phase.
图 37.2 对输入信号进行窗口处理。提取的片段包含一个隐式方形窗口W1,这会使分析失真。平滑窗口W2可以减少失真。
Figure 37.2 Windowing an input signal. The extracted segment has an implicit square window W1, which distorts the analysis. The smooth window W2 reduces the distortion.
图 37.3 矩形窗口对频谱图的影响。(a)一个矩形窗口围绕 177 Hz 正弦波的八个周期。(b)图 (a) 中的频谱显示了 22 Hz 至 4,000 Hz 范围内的能量。理想情况下,正弦波的频谱应该是一条直线。然而,矩形窗口会将能量分散到输入频率的上下。
Figure 37.3 The effect of a rectangular window on the spectrum plot. (a) A rectangular window around eight periods of a sinusoid at 177 Hz. (b) Spectrum of (a) indicating energy from 22 Hz to 4,000 Hz. Ideally, the spectrum of a sinusoid should be a single line. Instead, the rectangular window scatters the energy above and below the input frequency.
图 37.4 高斯窗函数(左)及其频谱(右)。
Figure 37.4 Gaussian window function (left) and its spectrum (right).
图 37.5 短时傅里叶变换概览,以 FFT 为核心。
Figure 37.5 Overview of the short-time Fourier transform, with the FFT at its core.
图 37.6 STFT 信号。(a)输入波形。(b)加窗后的信号段。(c)0 至− 80 dB 范围内的幅度谱。(d) −π至π范围内的相位谱。(Serra 1989 年版)
Figure 37.6 STFT signals. (a) Input waveform. (b) Windowed segment. (c) Magnitude spectrum plotted over the range 0 to −80 dB. (d) Phase spectrum plotted over the range −π to π. (After Serra 1989.)
图 37.7 重叠相加再合成。灰色区域表示重叠的频谱帧。请注意,为了视觉清晰,我们仅显示五帧。实际应用中,通常每秒分析的声音帧数超过 100 帧。
Figure 37.7 Overlap-add resynthesis. The gray areas indicate overlapping spectrum frames. Note that for visual clarity, we show only five frames. In practice it is typical to use more than 100 frames per second of analyzed sound.
图 37.8 振荡器组再合成。分析数据已转换为一组连续的幅度和频率包络。再合成所需的振荡器数量会根据声音的复杂程度而增减。
Figure 37.8 Oscillator bank resynthesis. The analysis data has been converted into a set of continuous amplitude and frequency envelopes. The number of oscillators needed for the resynthesis grows and shrinks depending on the complexity of the sound.
图 37.9从帧到幅度包络。上图显示了时间 t1 到 t4 的四个连续帧。每帧包含三个频率点。频率点a 的幅度从四个不同的时间映射到包络的断点。
Figure 37.9 From frames to amplitude envelopes. The top part shows four successive frames at times t1 to t4. Each frame has three frequency bins. The amplitude of frequency bin a is mapped from four different times to breakpoints of an envelope.
图 37.10 频率不确定性。(a)最小的窗口由一个采样点组成,无法揭示其所属波形的任何信息。该样本可能是高频波形(b)或低频波形(c)的一部分。只有扩大窗口大小,我们才能精确地知道窗口中包含哪些频率。
Figure 37.10 Frequency uncertainty. (a) The smallest possible window, consisting of one sample point, reveals nothing about the waveform it could be part of. The sample could be part of a high frequency waveform (b) or a low-frequency waveform (c). Only by expanding the window size can we know with precision what frequencies are in the window.
图 37.11 窗口大小与频率分析箱数量的关系。(a)四个样本的窄窗口只能分辨两个频率。(b)十六个样本的宽窗口将频谱分成八个箱。
Figure 37.11 Relationship of window size to the number of frequency analysis bins. (a) A narrow window of four samples can resolve only two frequencies. (b) a wider window of sixteen samples divides the spectrum into eight bins.
图 37.12 声音频率从 2 Hz 变化到 3 Hz 的三个 STFT“快照”。本例中,STFT 的分析单元间隔为 1 Hz。当输入频率为 2.5 Hz 时,它落在分析仪等距频率单元之间,能量分布在整个频谱上。(Hutchins 1984 年)
Figure 37.12 Three STFT “snapshots” of a sound changing frequency from 2 to 3 Hz. The STFT in this case has analysis bins spaced at 1 Hz intervals. When the input frequency is 2.5 Hz, it falls between the equally spaced frequency bins of the analyzer, and the energy is spread across the entire spectrum. (After Hutchins 1984.)
图 37.13 (a) 女声“你继续选择”的时域波形。(b) (a) 的声谱图。频率范围扩展至 7 kHz。
Figure 37.13 (a) Time-domain waveform of a female voice saying: “You just keep selecting.” (b) Sonogram of (a). The frequency scale extends to 7 kHz.
图 37.14 声谱图分析和显示中时间与频率的权衡。所有显示均显示以 44.1 KHz 采样的语音。(a)分析窗口长 32 个样本,时间分辨率为 0.725 毫秒,频率分辨率为 1378 Hz。(b)分析窗口长 1,024 个样本,时间分辨率为 23.22 毫秒,频率分辨率为 43.07 Hz。(c)分析窗口长 8,192 个样本,时间分辨率为 185.8 毫秒,频率分辨率为 5.383 Hz。(声谱图由 Gerhard Eckel 使用其 SpecDraw 程序提供。)
Figure 37.14 Time-versus-frequency trade-offs in sonogram analysis and display. All displays show speech sound sampled at 44.1 KHz. (a) Analysis window is 32 samples long, time resolution is 0.725 ms, and frequency resolution is 1378 Hz. (b) Analysis window is 1,024 samples long, time resolution is 23.22 ms, and frequency resolution is 43.07 Hz. (c) Analysis window is 8,192 samples long, time resolution is 185.8 ms, and frequency resolution is 5.383 Hz. (Sonograms provided by Gerhard Eckel using his SpecDraw program.)
图 37.15 长度为八个样本的分析窗口的不同跳跃大小。h1 和h 2 是每个窗口的起始时间。(a)跳跃大小 = 窗口大小时,窗口不重叠。( b)跳跃大小小于窗口大小时,窗口重叠。在这种情况下,跳跃大小为四个样本。
Figure 37.15 Varying hop size for analysis windows that are eight samples long. h1 and h 2 are the starting times for each window. (a) Nonoverlapping windows when hop size = window size. (b) Overlapping windows when hop size is less than window size. In this case the hop size is four samples.
图 37.16 峰值识别与跟踪。(a)分离一组频谱峰值。(b)将频率引导与峰值拟合。顶部的引导 1 在三帧后未唤醒,因此被删除。引导 2 仍在休眠状态。引导 3 和 4 处于活动状态。引导 5 从一个新的峰值开始。
Figure 37.16 Peak identification and tracking. (a) Isolation of a set of spectrum peaks. (b) Fitting frequency guides to peaks. Guide 1 at the top did not wake up after three frame, so it is deleted. Guide 2 is still sleeping. Guides 3 and 4 are active. Guide 5 starts from a new peak.
图 37.17 Xavier Serra 频谱建模合成技术的分析部分。确定性部分采用正弦加法合成方法。信号的随机部分源自确定性(准谐波)部分的再合成与输入波形的 STFT 之间的差异。系统通过对每个残差分量进行包络拟合来简化每个残差分量。包络表示使随机部分更容易被音乐家修改。然后,随机部分的再合成使用这些包络和随机相位分量——相当于滤波后的白噪声。
Figure 37.17 Analysis part of Xavier Serra’s spectral modeling synthesis technique. The deterministic part takes a sinusoidal additive synthesis approach. The stochastic part of the signal derives from the difference between the resynthesis of the deterministic (quasiharmonic) part and the STFT of the input waveform. The system simplifies each residual component by fitting an envelope to it. The envelope representation makes the stochastic part easier to modify by musicians. The resynthesis of the stochastic part then uses these envelopes with a random phase component—equivalent to filtered white noise.
图 38.1 Kay 和 Marple (1981) 提出的测量单个输入声音频谱的不同方法。在描述中,PSD 表示功率谱密度。所有情况下,水平刻度均为频率,从 0 到采样率的一半。垂直刻度为振幅,从顶部的 0 dB到底部的-40 dB,以线性方式绘制。(a) 输入源,由三个正弦波和一个噪声带组成。(b) 带双零填充 FFT 的周期图。(c) Blackman-Tukey PSD。(d) 通过 Yule-Walker 方法计算的自回归 PSD。(e) 通过 Burg 方法计算的自回归 PSD。(f) 通过最小二乘方法计算的自回归 PSD。(g) 移动平均 PSD。它与 (c) 相同,因为仅使用了自相关滞后估计。(h) 通过扩展 Yule-Walker 方法计算的 ARMA PSD。(i) Pisarenko 谱线分解。(j) Prony PSD。 (k)通过希尔德布兰德方法的特殊 Prony。(l) Capon 或最大似然法。
Figure 38.1 Different ways of measuring spectrum for a single input sound, from Kay and Marple (1981). In the descriptions, PSD means power spectrum density. The horizontal scale in all cases is frequency, from 0 up to half the sampling rate. The vertical scale is amplitude, from 0 dB at the top to −40 dB at the bottom, plotted linearly. (a) Input source, consisting of three sinusoids and a band of noise. (b) Periodogram with double-zero padding FFT. (c) Blackman-Tukey PSD. (d) Autoregressive PSD via Yule-Walker approach. (e) Autoregressive PSD via Burg approach. (f) Autoregressive PSD via least-squares approach. (g) Moving average PSD. It is identical to (c) because only autocorrelation lag estimates were used. (h) ARMA PSD via extended Yule-Walker approach. (i) Pisarenko spectral line decomposition. (j) Prony PSD. (k) Special Prony via Hildebrand approach. (l) Capon or maximum likelihood.
图 38.2 恒定Q 值方法与傅里叶方法的滤波器间距对比。(a)仅使用 43 个滤波器(图中显示 19 个),恒定Q 值方法实现了从 20 Hz 到 21 kHz 的 1/4 倍频程频率分辨率。(b)傅里叶滤波器间距,每 46 Hz 一个频带。尽管使用了几乎 12 倍的滤波器数量(512 个,图中显示 8 个),傅里叶方法仍然无法达到恒定Q 值方法的低频分辨率。傅里叶方法在整个音频带宽内都具有 46 Hz 的分辨率,即使在人耳无法准确分辨这些差异的最高倍频程中也是如此。
Figure 38.2 Spacing of filters for constant Q versus Fourier techniques. (a) Using only 43 filters (19 are shown), the constant Q method achieves 1/4-octave frequency resolution from 20 Hz to 21 kHz. (b) Fourier filter spacing, with a band every 46 Hz. Using almost 12 times as many filters (512; 8 are shown), Fourier methods still do not have the low-frequency resolution as constant Q methods. The Fourier method will have 46 Hz resolution throughout the audio bandwidth, even in the highest octave where the ear cannot accurately resolve these differences.
图 38.3 相同时间-频率区域上的小波与短时傅里叶表示。纵轴为频率。右侧,STFT 的时间分辨率在每个频率上都是均匀的。左侧的小波网格在频谱的上部具有更精细的时间分辨率。这是所谓的具有倍频程分辨率的二进小波的表示。其他网格可以定义为具有更高的分辨率(Evangelista 2001)。
Figure 38.3 Wavelet versus short-time Fourier representation over the same time-versus-frequency area. Frequency is on the vertical axis. On the right, the STFT’s time resolution is uniform at every frequency. The wavelet grid on the left has finer time resolution in the upper range of the spectrum. This is a representation of the so-called dyadic wavelet with octave resolution. Other grids can be defined to have greater resolution (Evangelista 2001).
图 38.4 利用频率弯曲小波对时频平面进行平铺。由于每个基元的延迟与频率相关,时频局部化区域呈现出弯曲的边界。摘自 Evangelista (2001)。
Figure 38.4 Tiling the time-frequency plane by means of frequency warped wavelets. Due to the frequency-dependent delay of each basis element, the time-frequency localization zones are characterized by curved boundaries. From Evangelista (2001).
图 38.5 小波图显示的三个重叠正弦波。小波图分为两部分:左侧显示的模量(或幅度)和相位图或相位图。两者均从左到右显示时间。纵轴以对数刻度绘制频率。两部分的顶部是波形的标准时域图,以供参考。模量图中的深色表示能量。请注意高频“指针”,它们显示了每个正弦波的起始时间。相位图(右侧面板)直接显示了波形的偏移。U 形“山峰”跟随波形的峰值。任何变化都显示为混沌表面,并再次带有指向变化时刻的“指针”。Arfib (1991) 作品。
Figure 38.5 Three overlapping sinusoids shown in a wavelet display. The wavelet display has two parts: the modulus (or magnitude), shown at left, and the phase display or phasogram. Both show time going from left to right. The vertical axis plots frequency on a logarithmic scale. At the top of both parts is a standard time-domain plot of the waveform for reference. In the modulus, darkness indicates energy. Notice the high-frequency “pointers” showing the onset time of each sinusoid. The phasogram (right panel) shows excursions of the waveform directly. The U-shaped “mountains” follow the peaks of the waveform. Any changes show up as chaotic surfaces, again with “pointers” to the instant of change. After Arfib (1991).
图 38.6 对应于下方乐谱的小波变换模量。深色三角形表示八度音程演奏时出现的最大值。源自 Kronland-Martinet 和 Grossman (1991)。
Figure 38.6 Modulus of the wavelet transform corresponding to the music notation written below. Dark triangles indicating maxima occur when octaves play. After Kronland-Martinet and Grossman (1991).
图 38.7 用小波进行瞬态检测。上图显示了时域信号中的毛刺。下图显示了小波表示。高频小波精确地指向了毛刺发生的时间。低频小波(底部的水平带)无法察觉毛刺。Kronland-Martinet (1988) 著。
Figure 38.7 Transient detection by wavelets. The top graph shows a glitch in the time-domain signal. The bottom graph shows the wavelet representation. High-frequency wavelets point precisely to the time of the glitch. The glitch is invisible to the low-frequency wavelets (the horizontal band at the bottom). After Kronland-Martinet (1988).
图 38.8 一个恒定频率的节律,其节拍随着振幅的变化而变化,源自 Smith 和 Honing (2008)。上图显示脉冲振幅,节拍在 4.2 到 11.2 秒的周期内从 3/4 变为 4/4。尺度图(中间)和相位图(下方)显示了节律脉冲函数的连续小波变换。尺度图上,在较短的起始间隔尺度上,脉冲的强度变化清晰可见,能量最高的时频脊线位于 0.35 秒处,与起始间隔一致。尺度图上可见一条能量较低的脊线,相位图上更清晰可见,其周期从 1.05 秒变为 1.4 秒,与节拍的时长一致。相位图上用黑线标记了这条脊线。
Figure 38.8 A constant frequency rhythm changing in meter by variations in amplitude, after Smith and Honing (2008). The upper plot shows the impulse amplitudes, with the meter changing from 3/4 to 4/4 over the period of 4.2 to 11.2 seconds. The scalogram (center) and phasogram (lower) plots display a continuous wavelet transform of the rhythmic impulse function. The intensity variations of the impulses are discernable in the scalogram at short inter-onset interval scales, and the time-frequency ridge with the most energy is at 0.35 seconds matching the inter-onset interval. A lower energy ridge is visible on the scalogram and more clearly on the phasogram, changing in its period from 1.05 seconds to 1.4 seconds, matching the duration of the bar. It is marked on the phasogram as a black line.
图 38.9 用小波分离谐波谱中的噪声。幅值(垂直)随时间(水平)变化的曲线图。上图为原始吉他音色。中间部分是梳状小波变换后的含噪残差,其中包含音符的特征起音部分。下图显示了梳状小波方法中准谐波部分的再合成。(图片由 Gianpaolo Evangelista 提供。)
Figure 38.9 Wavelet separation of noise from harmonic spectrum. Amplitude (vertical) versus time (horizontal) plots. The top part is the original guitar tone. The middle part is the noisy residual from the comb wavelet transform, which includes the characteristic attack part of the note. The bottom figure shows the resynthesis from the quasiharmonic part of the comb wavelet method. (Figure courtesy of Gianpaolo Evangelista.)
图 38.10 维格纳分布图。
Figure 38.10 Wigner distribution plots.
图 38.11 三种时频表示的比较。(a) Curtis Roads 的Pictor alpha(2003)中 100 毫秒提取的波形。(b)短时傅里叶变换的频谱图。(c)使用 Gabor 小波进行离散小波变换的尺度图。(d)使用 Gabor 原子字典进行原子分解的 Wivigram。来自 Sturm 等人(2009 年)。
Figure 38.11 Comparison of three time-frequency representations. (a) Waveform of a 100 ms extract from Pictor alpha (2003) by Curtis Roads. (b) Spectrogram from short-time Fourier transform. (c) Scalogram from discrete wavelet transform using Gabor wavelet. (d) Wivigram from atomic decomposition using a dictionary of Gabor atoms. From Sturm et al. (2009).
图 38.12 几种频谱估计器的比较。倒谱法可以找到整体频谱包络。LPC 最擅长提取谱峰。(有关倒谱的解释,请参阅第 34 章。)摘自 IRCAM (2011)。
Figure 38.12 Comparing several spectrum estimators. The cepstrum finds the overall spectrum envelope. LPC is best at extracting spectral peaks. (See chapter 34 for an explanation of cepstrum.) From IRCAM (2011).
图 38.13 美式双元音ree的扩展耳蜗图。水平线表示前三个共振峰轨道。垂直线表示声门脉冲,由于通过耳蜗的自然延迟而略微倾斜。源自 Slaney 和 Lyon (1992)。
Figure 38.13 Expanded cochleagram of the American diphthong ree. The horizontal lines indicate the first three formant tracks. The vertical lines indicate glottal pulses, which are tilted slightly due to the natural delay through the cochlea. From Slaney and Lyon (1992).
图 38.14 钟鸣相关图。(a) 开始。(b) 600 毫秒。(c) 2.0 秒。(a) 中尤为明显的 U 形曲线,是由时间网格的连续划分产生的——就像你看到的是一段频率的波形峰值,底部是低频(因此峰值之间的周期更长)。摘自 Slaney 和 Lyon (1992)。
Figure 38.14 Correlogram of the striking of a chime. (a) Onset. (b) 600 ms. (c) 2.0 seconds. The U-shaped curve, particularly evident in (a), results from successive divisions of the grid in time—as if you were looking at the waveform peaks of a band of frequencies, with low frequencies (and therefore longer periods between peaks) at the bottom. From Slaney and Lyon (1992).
图 39.1 原始干净信号(左上),随后被噪声污染(右上)。下方五个波形是通过匹配追踪从 Gabor 原子(调制高斯窗)和狄拉克脉冲字典中筛选出的原子。最终波形r (5) 显示了最终的残差(移除五个原子后的初始信号)。每幅图底部的两个时频图分别是分解后的维格纳-维尔分布图(wivigram),分解后的维格纳-维尔分布图是通过匹配追踪筛选出的单个原子的维格纳-维尔分布与短时傅里叶变换 (STFT) 的叠加。每个维格纳-维尔分布图的时频局部化程度明显优于短时傅里叶变换 (STFT)。
Figure 39.1 The original clean signal (top left), which is then corrupted by noise (top right). The five waveforms below these are the atoms selected by matching pursuit from a dictionary of Gabor atoms (modulated Gaussian windows) and Dirac impulses. The final waveforms r(5) show the resulting residual (the initial signal with the five atoms removed). The two time-frequency plots at the bottom of each figure are the wivigrams of the decomposition, which is a superposition of the Wigner-Ville distributions of the individual atoms selected by matching pursuit, and the short-term Fourier transform (STFT). The time-frequency localization of each wivigram is clearly superior to that of the STFT.
图 39.2使用匹配追踪工具包(Krstulovic 和 Gribonval,2006)和包含 5,825,779 个 Gabor 原子的字典,对 Curtis Roads 的《Pictor Alpha》 (2004) 的前七秒进行分解。上图:Spikegram 表示,一种指示事件精确开始的点阵图案。中图:Wivigram 表示。下图:投影到短时傅里叶变换字典。
Figure 39.2 Decomposition of the first seven seconds of Pictor Alpha (2004) by Curtis Roads with the Matching Pursuit Toolkit (Krstulovic and Gribonval 2006) and a dictionary of 5,825,779 Gabor atoms. Top: Spikegram representation, a pattern of dots that indicate the precise onset of events. Center: Wivigram representation. Bottom: Projection onto short-time Fourier transform dictionary.
图 39.3 SCATTER 应用程序的屏幕截图。我们加载了使用多尺度 Gabor 字典对音乐信号进行匹配追踪分解的结果。中间窗口显示了 Wivigram 表示及其上方的时域重合成。使用左侧的工具,我们可以选择特定的原子或原子区域进行调整。此处,我们用套索工具选择了一组原子,并在时间和频率上进行了位移。右侧窗格显示了各种选项以供进一步选择;例如,仅选择组中具有最小持续时间或振幅的原子。
Figure 39.3 Screenshot of the application SCATTER. We loaded the results of a matching pursuit decomposition of a musical signal using a multiscale Gabor dictionary. The center window shows the wivigram representation and the time-domain resynthesis above it. With the tools at the left, we can select specific atoms or regions of atoms to adjust. Here, a group of atoms has been selected with the lasso tool and displaced in time and frequency. The pane at right shows a variety of options for further selection; for example, select only those atoms in our group having a minimum duration or amplitude.
图 40.1 Arduino 微控制器板。传感器可以连接到该板上,使其成为音乐输入设备。
Figure 40.1 Arduino microcontroller board. Sensors can be attached to this board to turn it into a musical input device.
图 40.2 电子输入设备将手势与发声机制分离。任意一个输入设备都可以产生相同的声音。
Figure 40.2 Electronic input devices detach the gesture from the sound production mechanism. Any one of a number of input devices can generate the same sound.
图 40.3 输入设备作为连接到电子接口的传感器的模型。
Figure 40.3 Model of an input device as a sensor connected to an electronic interface.
图 40.4 罗伯特·穆格 (Robert Moog) 演示演奏特雷门琴的正确位置,出自1960 年Vanguard Theremin Model 505 操作说明。
Figure 40.4 Robert Moog demonstrating the correct position for playing the Theremin, from Operating Instructions for the Vanguard Theremin Model 505, 1960.
图 40.5 使用传统乐器作为计算机音乐的控制器(Negyesy 和 Ray 1989)。小提琴家 Janos Negyesy 正在演奏一把装有传感器的小提琴,用于控制电子音乐硬件。左图:系统设计师 Lee Ray。
Figure 40.5 Using a traditional instrument as a controller for computer music (Negyesy and Ray 1989). Violinist Janos Negyesy playing a violin equipped with sensors to control electronic music hardware. Left: Lee Ray, system designer.
图 40.6a 电子和计算机音乐输入设备的照片拼贴(1960–2020),键盘。
Figure 40.6a Photocollage of electronic and computer music input devices (1960–2020), Keyboards.
图 40.6b 电子和计算机音乐输入设备的照片拼贴(1960-2020 年),杂项。
Figure 40.6b Photocollage of electronic and computer music input devices (1960–2020), Miscellaneous.
图 40.6c 电子和计算机音乐输入设备的照片拼贴画(1960–2020),手套和戒指。
Figure 40.6c Photocollage of electronic and computer music input devices (1960–2020), Gloves & Rings.
图 40.6d 电子和计算机音乐输入设备的照片拼贴(1960–2020),超声波和红外线。
Figure 40.6d Photocollage of electronic and computer music input devices (1960–2020), Ultrasound & Infrared.
图 40.6e 电子和计算机音乐输入设备的照片拼贴(1960–2020),Brain & VR。
Figure 40.6e Photocollage of electronic and computer music input devices (1960–2020), Brain & VR.
图 40.6f 电子和计算机音乐输入设备的照片拼贴(1960–2020),弦乐。
Figure 40.6f Photocollage of electronic and computer music input devices (1960–2020), Strings.
图 40.6g 电子和计算机音乐输入设备的照片拼贴画(1960–2020),打击乐。
Figure 40.6g Photocollage of electronic and computer music input devices (1960–2020), Percussion.
图 40.6h 电子和计算机音乐输入设备的照片拼贴画(1960–2020),Winds。
Figure 40.6h Photocollage of electronic and computer music input devices (1960–2020), Winds.
图 40.6i 电子和计算机音乐输入设备的照片拼贴(1960–2020),滑块、按钮和旋钮(MIDI 推子盒)。
Figure 40.6i Photocollage of electronic and computer music input devices (1960–2020), Sliders, Buttons, and Knobs (MIDI fader boxes).
图 40.7 输入设备和合成器之间的软件可以解释和重新映射乐器演奏者的手势。
Figure 40.7 Software in between the input device and the synthesizer allows the possibility of interpreting and remapping the instrumentalist’s gestures.
图 40.8 带有长冲程推子的专业便携式音频混音器。
Figure 40.8 Professional portable audio mixer with long-throw faders.
图 40.9 Waldorf Quantum 数字合成器键盘(2019 年),带显示屏,61 键,左侧两个用于控制弯音和颤音的拇指轮,以及数十个 LED 指示灯的旋钮和按钮。设备背面有脚踏板输入插孔、音频输入和输出插孔,以及 USB 和 MIDI 输入和输出插孔。
Figure 40.9 Digital synthesizer keyboard of the Waldorf Quantum (2019) with display screen, 61 keys, two thumbwheels at the left for pitch bend and vibrato, and dozens of LED-lit rotary knobs and buttons. On the back of the unit are jacks for foot pedal inputs, audio inputs and outputs, and USB and MIDI input and output.
图 40.10 键盘中母线的水平剖面图。按下按键时,触点底座从上部母线移动到下部母线。
Figure 40.10 Horizontal cutaway view of bus bars in a keyboard. The contact plinth moves from the upper to the lower bus bar when the key is pressed.
图 40.11 Elaine Walker 制作的 Bohlen-Pierce 音阶键盘上的拇指轮。
Figure 40.11 Thumbwheels on keyboard for the Bohlen-Pierce scale made by Elaine Walker.
图 40.12 自动伴奏的舞者。Bösendorfer Disklavier 再生钢琴的广告。
Figure 40.12 Dancers with automatic accompaniment. Advertisement for Bösendorfer Disklavier reproducing piano.
图 40.13 Doepfer LMK4 + 88 键重锤式键盘。重 24 公斤,售价约 1,800 美元。
Figure 40.13 Doepfer LMK4 + 88-key hammer-action keyboard. It weighs 24 kg and costs around $1,800.
图 40.14 分割键盘被分成几个区域,每个区域通过不同的 MIDI 通道进行传输。
Figure 40.14 A split keyboard is divided into several regions, each of which transmits over a different MIDI channel.
图 40.15 两台 Roli Seaboard 正在运行。这些设备是无线的,通过蓝牙协议进行通信。
Figure 40.15 Two Roli Seaboards in action. The devices are wireless, communicating via the Bluetooth protocol.
图40.16 Joel Chadabe 指挥一台新英格兰数字同步合成器,它使用了由 Robert Moog 设计并制造的改良型特雷门琴天线。演出地点:纽约 The Kitchen 表演空间,1979年。(摄影:Carlo Carnevali)
Figure 40.16 Joel Chadabe conducting a New England Digital Synclavier synthesizer using modified theremin antennae designed and built by Robert Moog. Performance at The Kitchen performance space, New York City, 1979. (Photograph by Carlo Carnevali.)
图 41.1 马克斯·马修斯 (Max Mathews) 弹奏音乐键盘(右手)并控制连接到 1970 年投入使用的 GROOVE 混合合成器的旋钮(左手)。
Figure 41.1 Max Mathews playing a musical keyboard (right hand) and controlling a knob (left hand) connected to the GROOVE hybrid synthesizer, which became operational in 1970.
图 41.2 音序器概览,显示(顺时针)以下操作:(1) 音乐数据输入、(2) 多轨录音、(3) 编辑、(4) 保存编辑后的数据,以及 (5) 演奏。
Figure 41.2 Overview of a sequencer, showing (clockwise) the operations of (1) music data entry, (2) multitracking, (3) editing, (4) saving edited data, and (5) performing.
图 41.3 Korg SQ-1 硬件音序器产生 MIDI 和控制电压。
Figure 41.3 Korg SQ-1 hardware sequencer generates both MIDI and control voltages.
图 41.4 布鲁日古代钟琴的步进模式编程(摘自 Buchner [1978])。
Figure 41.4 Step mode programming an ancient carillon in at Bruges (from Buchner [1978]).
图 41.5 在连续的纸卷上记录旋律图序列。乐器的键下方有一条黄铜条,连接到电池的正极。按下某个键时,弹簧会与该键对应的导线接触。电路闭合,电流通过梳齿所在的纸张。只要按下键,就会发生化学反应,并在移动的纸张上产生一条彩色线条 (Ord-Hume 1973)。
Figure 41.5 Melograph sequence recording on a continuous roll of paper. Below the keys of instrument lies a brass strip connected to the positive terminal of a battery. If a key is depressed, a spring establishes contact with the corresponding wire of the key. The circuit is closed, and current passes through the paper where the corresponding tooth of a comb lies. A chemical reaction takes place and produces a colored line on the moving paper as long as the key is depressed (Ord-Hume 1973).
图 41.6 纽约市哥伦比亚/普林斯顿电子音乐中心的 RCA Mark II 合成器。打字机在纸带上打孔,然后将纸带送入合成器的控制机构。
Figure 41.6 RCA Mark II Synthesizer at the Columbia/Princeton Electronic Music Center in New York City. The typewriters punch holes in paper tape, which is fed into the control mechanism of the synthesizer.
图 41.7 Moog 960 模拟音序器模块。音序器的 24 个步骤分为三行。左侧是时钟模块。每列上方的指示灯指示当前活动步骤。
Figure 41.7 Moog 960 analog sequencer module. The 24 steps of the sequencer are divided into three rows. A clock module is at the left. Lights above each column indicate active steps.
图 41.8 Z-Machines 机器人吉他手,拨弦机制的细节(Suzuki 2018)。
Figure 41.8 Z-Machines robot guitar player, detail of plucking mechanism (Suzuki 2018).
图 41.9 使用外部 MIDI 硬件乐器和效果器单元时的音序器演奏设置。音序器音轨与通道、乐器和音色的映射。
Figure 41.9 Sequencer performance setup in the case of external MIDI hardware instruments and effects units. Mapping of sequencer tracks to channels, instruments, and patches.
图 41.10 Novation Launchpad Pro,一款流行的控制器,旨在启动 Ableton Live 中的剪辑和序列。
Figure 41.10 Novation Launchpad Pro, a popular controller designed to launch clips and sequences in Ableton Live.
图 41.11 音序器轨道的内部表示。(a) 编辑前,轨道可以表示为一个数组。(b) 编辑期间和编辑后,轨道将变为一个链表,其中混合了各个事件和子数组。此处,事件 b已从第二个元素移至最后一个元素位置。
Figure 41.11 Internal representation of a sequencer track. (a) Before editing, the track can be represented as an array. (b) During and after editing the track becomes a linked list, intermingling individual events and subarrays. Here event b has been moved from the second element to the last element position.
图 41.12 音序器程序同时录音和播放的操作。音序器将现有音轨与实时演奏的输入合并。
Figure 41.12 Operation of a sequencer program simultaneously recording and playing back. The sequencer merges existing tracks with input from a real-time performance.
图 41.13 两个事件流(每个玩家一个)在多个时间尺度上的投影。(在原始图中,流具有不同的颜色。)图像右侧的圆圈和三角形表示五秒时间尺度上的频率(从左到右)、振幅(从下到上)和光谱亮度(符号)。圆圈表示光谱平坦度,三角形表示更亮的元素。当信号变得更嘈杂/更亮时,会出现菱形、正方形和阴影正方形(未显示)。实心圆圈表示起始点。图像背面是瞬时频谱显示以及先前值的部分不透明瀑布图。微小的点显示更长时间尺度(四十五秒)的活动直方图。大的实心框作为频率、振幅和音色的短期视图,可以轻松看到重复的模式。图片由 Julian Rawlinson 提供。
Figure 41.13 Projection of two event streams (one from each player) on multiple time scales. (In the original the streams have different colors.) The circles and triangles on the right of the image indicate frequency (left to right), amplitude (bottom to top), and spectral brightness (symbol) over a five-second time scale. A circle indicates spectral flatness, and a triangle indicates brighter elements. Diamonds, squares, and hatched squares (not shown) appear as the signal becomes noisier/brighter. Solid circles indicate onsets. At the back of the image is an instantaneous spectrum display plus a partly opaque waterfall plot of previous values. Tiny dots show a histogram of activity on a longer time scale (forty-five seconds). The big solid boxes serve as a short-term view of frequency, amplitude, and timbre and make it easy to see repeating patterns. Image courtesy of Julian Rawlinson.
图41.14 莫顿·苏博特尼克(Morton Subotnick)1983年在纽约林肯中心的现场音乐会。合成器是由计算机控制的Buchla模拟系统。(摄影:B. Bial)
Figure 41.14 Morton Subotnick, live in concert at Lincoln Center, New York in 1983. The synthesizer is a computer-controlled Buchla analog system. (Photograph by B. Bial.)
图 41.15 计算机伴奏系统概述。它聆听人类演奏者的演奏,并生成与人类演奏者的方向和流程相匹配的伴奏。
Figure 41.15 Overview of a computer accompaniment system. It listens to a human performer and generates an accompaniment that matches the direction and flow of the human performer.
图 41.16 1996 年 Sensorband 的艺术家们演奏 Soundnet 乐器。
Figure 41.16 Artists of the Sensorband playing the Soundnet instrument in 1996.
图 41.17 The Hub,一支先锋计算机乐队,正在加州奥克兰米尔斯学院举办音乐会。每位音乐家都与一台计算机互动,而计算机又与其他计算机互动,从而创造出一场合奏表演。从左到右:T. Perkis、P. Stone、C. Brown、S. Gresham-Lancaster、M. Trayle、J. Bischoff。(图片版权归 Jim Block 所有。)
Figure 41.17 The Hub, a pioneering a computer band, performing in concert at Mills College, Oakland, California. Each musician interacts with a computer that in turn interacts with other computers to create an ensemble performance. From left to right: T. Perkis, P. Stone, C. Brown, S. Gresham-Lancaster, M. Trayle, J. Bischoff. (Photograph copyright Jim Block.)
图 41.18 ChucK 音乐编程语言的共同发明人王戈在 2008 年左右的表演中现场编码。代码被投影到屏幕上供观众观看。
Figure 41.18 Ge Wang, co-inventor of the ChucK music programming language, live coding in performance around 2008. The code is projected on-screen for the audience to see.
图 41.19 莉兹·菲利普斯的互动装置作品《声音桌 II》, 1992 年在纽约市 Threadwaxing 空间展出。桌上的水会传导和传输电容场。当观众/听众靠近桌子时,桌子下方的扬声器会发出回响的共振。声音会根据靠近桌子的人的手势做出反应。他们可以看到声能波震动水面,在水池中形成一个框架图像。(摄影:R. Winard)
Figure 41.19 An interactive installation by Liz Phillips. Soundtable II, 1992 installation at the Threadwaxing Space, New York City. The water in the table conducts and transmits a capacitance field. As viewer/listeners move near the table, a loudspeaker under it emits reverberated resonances. Sound formations respond to the gestures of those who approach the table. They can watch as waves of sound energy vibrate the surface of the water, creating a framed image in the pool. (Photograph by R. Winard.)
图 42.1 Pro Tools 屏幕显示四个 MIDI 音轨(垂直堆叠),以钢琴卷帘时间线符号显示,由四个不同的软件乐器演奏。由于选择了音轨 4,因此其软件乐器显示在底部。
Figure 42.1 Pro Tools screen showing four MIDI tracks (stacked vertically) in piano-roll time line notation played by four different software instruments. Track 4 is selected, so its software instrument is displayed at bottom.
图 42.2 控制器通道中的数据与钢琴卷帘窗中音符数据的叠加显示。通道 1、2 和 4 显示连续包络。通道 3 显示 MIDI 离散音符力度值,绘制为垂直尖峰。
Figure 42.2 Data in controller lanes superimposed over note data in piano roll display. Lanes 1,2, and 4 show continuous envelopes. Lane 3 shows MIDI discrete note velocity values plotted as vertical spikes.
图 42.3 顶部钢琴卷帘音符显示下方显示四个控制器通道。音符力度数据显示为垂直尖峰。弯音是一个包络。音色变化用编号框表示。MIDI 声像是一个在左下角和右下角之间摆动的随机包络。
Figure 42.3 Four controller lanes displayed beneath the piano-roll note display at top. Note velocity data is shown as vertical spikes. Pitch bend is an envelope. Program changes are indicated by numbered boxes. MIDI pan is a random envelope swinging between left and bottom right.
图 42.4 Ableton Live,一款循环导向的音序器。在 Session 视图中,各列(鼓、贝斯、合成器等)显示可在演奏中触发的循环。Arrangement 视图(未显示)用于在时间线上进行创作。用户通常在 Arrangement 视图中创作,然后将各个片段传输到 Session 视图进行现场演奏。图片由 Chris Ozley 提供。
Figure 42.4 Ableton Live, a loop-oriented sequencer. In Session view, the columns (Drums, Bass, Synths, etc.) show loops that can be triggered in performance. Arrangement view (not shown) is used for composition on a timeline. Users often compose in arrangement view and then transfer individual clips into session view for live performance. Image thanks to Chris Ozley.
图 42.5 Numerology 步进音序器屏幕细节。顶部为复音序列,下方为每步门限和力度控制。
Figure 42.5 Detail of Numerology step sequencer screen. A polyphonic sequence (top) with per-step gate and velocity controls beneath.
图 42.6 Reaper 中的钢琴卷帘符号。
Figure 42.6 Piano roll notation in Reaper.
图 42.7 Logic Pro 中的 MIDI 事件列表。事件列表视图(右)显示在轨道视图(左)中选择的所有 MIDI 事件。
Figure 42.7 MIDI event list in Logic Pro. The event list view (right) displays all MIDI events that have been selected in the track view (left).
图 42.8 MIDI 和 CMN。在 Pro Tools 中,使用内置的 Sibelius 乐谱编辑器显示巴赫C 大调前奏曲与赋格BWV 846。右侧显示相应的钢琴卷帘窗。
Figure 42.8 MIDI and CMN. Bach Prelude and Fugue in C Major BWV 846 shown in Pro Tools with the built-in Sibelius notation editor. The corresponding piano-roll display can be seen on the right.
图 42.9 鼓序列编辑器中的韵律网格。
Figure 42.9 Metrical grid in a drum sequence editor.
图 42.10 Logic 中的控制器包络。
Figure 42.10 Controller envelope in Logic.
图 42.11 Pro Tools 中四个乐器(MIDI 插件)轨道和四个外部 MIDI 轨道上的 MIDI 音量推子。
Figure 42.11 MIDI volume faders on four instrument (MIDI plug-in) tracks and four external MIDI tracks in Pro Tools.
图 42.12 包含 MIDI 和音频轨道的 DAW 会话。轨道 1-4 为 MIDI 轨道。轨道 3 和 4 显示在控制器通道中进行编辑。立体声轨道 5 和 6 为音频轨道。轨道 5 的开始时间晚于轨道 6。
Figure 42.12 DAW session with combination of MIDI and audio tracks. Tracks 1–4 are MIDI. Tracks 3 and 4 show editing in controller lanes. Stereo tracks 5 and 6 are audio. Track 5 starts later than track 6.
图 43.1 Audacity 声音编辑器中同一种声音(语音)的五种不同显示方式。从上到下依次为:振幅、以 dB 为单位的振幅、线性频率刻度的频谱、对数频率刻度的频谱以及音高曲线。
Figure 43.1 Five different displays of the same sound (spoken voice) in the Audacity sound editor. From top to bottom: Amplitude, amplitude in dB, spectrum with linear frequency scale, spectrum with log frequency scale, and pitch curve.
图 43.2 拼接。(a)简单的粘贴命令相当于硬剪切。这会将一个声音的结尾与另一个声音的开头并列。(b)交叉淡入淡出使两个声音之间的过渡更加平滑。在本例中,交叉淡入淡出时间为 90 毫秒,在此期间,第一个声音逐渐减弱,第二个声音逐渐增强。
Figure 43.2 Splicing. (a) A simple paste command is equivalent to a hard cut. This juxtaposes the end of one sound with the beginning of another. (b) A crossfade smooths the transition between the two sounds. In this case, the crossfade time is 90 ms, during which the first sound fades down and the second sound fades up.
图 43.3 两个萨克斯音符的淡入淡出。(a)和(b)显示了原始两个音符的振幅包络,音高相差小三度。(c)和(d)是它们的淡入淡出版本。(e)是(c)与(d)淡入淡出的结果。
Figure 43.3 Crossfading two saxophone notes. (a) and (b) shows the amplitude envelope of the original two notes, a minor third apart in pitch. (c) and (d) are their faded versions. (e) is the result of crossfading (c) with (d).
图 43.4 声音编辑器中的淡入功能。(a)线性。(b)四分之一正弦。(c)对数。
Figure 43.4 Fade-in functions in a sound editor. (a) Linear. (b) Quarter-sine. (c) Logarithmic.
图 43.5 直流偏移。波形中心偏离 0。
Figure 43.5 DC offset. The center of the waveform is offset from 0.
图 43.6 Reason 中的音高编辑界面。音符控制柄(音符上方的深色矩形)用于调整所选音符的振幅。向下移动偏移控制柄(顶部带有白色圆圈的竖条)可减少颤音和音高偏移。音符中心的双箭头是音高偏移控制。
Figure 43.6 Pitch editing interface in Reason. The note handle (dark rectangle above the note) adjusts the amplitude of the selected note. Moving the drift handle (vertical bar with the white circle on top) down reduces vibrato and pitch drift. A double arrow on the center of the note is the pitch shift control.
图 43.7 四个 DAW 窗口的图像。(a)Ableton Live Session 窗口,可在其中实时触发声音。(b)Pro Tools 混音窗口。这是作者为《Always》(2013 年)创作的子混音会话,将四首新曲目混音到顶部音轨的立体声主干中。(c)Cubase Pro 10.5,显示曲目时间线、混音器、插件乐器和效果器,以及带有时间码的视频。(d)Linux 操作系统版 Ardour,显示Rodney Duplessis 的《De Rerum Natura》(2020 年)。
Figure 43.7 Image of four DAW windows. (a) Ableton Live Session window, in which sounds can be triggered in real-time. (b) Pro Tools mix window. A submixing session for Always (2013) by the author, mixing four new tracks into the stereo stem in the top track. (c) Cubase Pro 10.5, showing track time line, mixer, plug-in instruments and effects, and a video with timecode. (d) Ardour for Linux OS, showing De Rerum Natura (2020) by Rodney Duplessis.
图43.7 (续)
Figure 43.7 (continued)
图 43.8 Avid S6 控制台。S6 支持 EUCON,这是 Avid 的高速以太网控制协议,允许其控制界面连接到各种音频和视频软件,包括 Pro Tools、Media Composer、Logic Pro、Cubase、Nuendo、Premiere、Audition 等。
Figure 43.8 Avid S6 console. The S6 supports EUCON, Avid’s high-speed Ethernet control protocol that allows its control surfaces to connect to a variety of audio and video software, including Pro Tools, Media Composer, Logic Pro, Cubase, Nuendo, Premiere, Audition, and others.
图 43.9 Wwise 中的波形编辑器视图。
Figure 43.9 View of the waveform editor in Wwise.
图 44.1 SoundMagic Spectral频谱冻结插件。
Figure 44.1 SoundMagic Spectral spectral freeze plug-in.
图 44.2 白噪声的二维频谱图。横轴表示频率,纵轴表示振幅(以分贝为单位)。
Figure 44.2 2D spectrum display of white noise. Frequency is plotted on the horizontal axis logarithmically. The vertical scale is amplitude in decibels.
图 44.3 Alchemy 中的静态频谱编辑器。顶部面板显示波形;底部面板显示频谱,标记为正弦分音符。
Figure 44.3 Static spectrum editor in Alchemy. The top panel shows the waveform; the bottom shows the spectrum, labeled as sinusoidal partial numbers.
图 44.4 从谐波(a)合成波形(b)。
Figure 44.4 Synthesizing a waveform (b) from harmonics (a).
图 44.5 Fairlight CMI 屏幕,1979 年。用户调整所分析声音的每个谐波的振幅包络。
Figure 44.5 Fairlight CMI screen, 1979. The user adjusts amplitude envelopes for each harmonic of an analyzed sound.
图 44.6 小提琴二十四个泛音的控制函数,源自 Strawn (1987b)。大多数此类三维绘图显示的是各谐波相对于所有谐波最大值的振幅;此图则将每个谐波缩放至其自身最大值,以便更清晰地查看高次谐波的细节。用户可以添加、移动和删除断点。在 (b) 所示的示例中,通过手动编辑清除了谐波的起音。这在基波中尤其明显。此外,一些细节,尤其是高次谐波的细节也被去除了。这些无关的细节是由于线段近似算法在少数情况下“过度”计算造成的。去除这些细节不仅仅是出于美观的原因:根据 Strawn (1987b) 的说法,如果高次谐波的振幅彼此之间变化过快,就会出现被称为“breebles” (音调失真)的可听见的伪影。
Figure 44.6 Control functions for twenty-four partials of a violin tone, after Strawn (1987b). Most 3D plots of this kind show the amplitudes of the harmonics relative to the maximum of all of them; this plot shows each harmonic scaled to its own maximum, an option that makes it easier to see detail in the higher harmonics. One can add, move, and delete breakpoints. In the example shown in (b), the attacks of the harmonics were cleaned up by manual editing. This is especially easy to see in the fundamental. Also, some detail, especially in the upper harmonics, was removed. This extraneous detail resulted when the line-segment approximation algorithm “worked too hard” in a few cases. Such detail must be removed for more than cosmetic reasons: according to Strawn (1987b), audible artifacts called breebles occur if the amplitudes of the upper harmonics change too rapidly relative to each other.
图 44.7 SpecDraw 将人声与管弦乐队分离。人声元素遵循常见的颤音模式。
Figure 44.7 SpecDraw separation of vocals from orchestra. The vocal elements follow a common vibrato pattern.
图 44.8 在 AudioSculpt 中的超声波显示上绘制的断点函数。
Figure 44.8 Breakpoint function drawn over the sonographic display in AudioSculpt.
图 44.9 IrcamLab TS2 编辑器。
Figure 44.9 IrcamLab TS2 editor.
图 44.10 Audacity 中的频谱编辑。定义要变换的频谱区域。任何插件效果都可以应用于选定区域
Figure 44.10 Spectral editing in Audacity. Defining a region of the spectrum for transformation. Any plug-in effect can be applied to the selected region
图 44.11 使用 Audition 中的频谱显示来定位和消除语音录音中的电话噪音。(a)电话铃声(周期性点状图案)与语音混合的放大图像。(b)电话噪音的选择。(c)删除电话噪音的结果。
Figure 44.11 Using the spectral display in Audition to locate and remove telephone noise from a speech recording. (a) Zoomed-in image of telephone ring (the periodic dot pattern) mixed with speech. (b) Selection of telephone noise. (c) The result of deleting the telephone noise.
图 44.12 试听 FFT 滤波器。
Figure 44.12 Audition FFT filter.
图 44.13 LemurEdit 用于编辑跟踪相位声码器图像。
Figure 44.13 LemurEdit for editing tracking phase vocoder images.
图 44.14 使用 SPEAR 编辑频谱。音频为语音。(a) 中框选的泛音在 (b) 中被移动到频谱中新的时频位置。
Figure 44.14 Spectrum editing using SPEAR. The audio is speech. The partials selected by the box in (a) are moved to a new time-frequency position in the spectrum in (b).
图 44.15 SCATTER 应用程序。周围有空白的圆形区域已被选中并移动。
Figure 44.15 SCATTER app. The circular area with white space around it has been selected and moved.
图 44.16 iZotope RX 降噪模式。降噪控制窗口位于最前面,显示降噪过程的频谱分布。在此 Light 降噪操作中,低频噪声衰减至 -60 dB 以下,高频噪声衰减至-80 dB 以下。
Figure 44.16 iZotope RX in denoise mode. The denoise control window is in front, displaying the spectral profile of the process. In this Light denoising operation, low-frequency noises are attenuated below −60 dB, and higher frequencies are attenuated below −80 dB.
图 44.17 使用 iZoptope RX 频谱编辑器,可以在语音中间戳一个腔体。
Figure 44.17 Using the iZoptope RX spectrum editor, one can poke a cavity in the middle of a speech sound.
图 44.18 Hit ′ n ′ Mix Infinity 编辑器。
Figure 44.18 Hit′n′Mix Infinity editor.
图 45.1 Lilypond 文本和乐谱。
Figure 45.1 Lilypond text and score.
图 45.2 最早的图形化乐谱编辑器之一,由加拿大国家研究委员会于 20 世纪 60 年代末开发(Pulfer 1971 年描述)。请注意,操作员使用位置控制器而不是字母数字键盘来选择和编辑音符。
Figure 45.2 One of the earliest graphical music notation editors, developed at the National Research Council, Canada, in the late 1960s (described in Pulfer 1971). Notice that the operator is using positional controllers rather than the alphanumeric keyboard to select and edit notes.
图 45.3 Leland Smith,斯坦福大学音乐与声学计算机研究中心,1976 年。
Figure 45.3 Leland Smith, Center for Computer Research in Music and Acoustics, Stanford University, 1976.
图 45.4 Dataland 扫描笔记系统乐谱在大幅面笔式绘图仪上的打印输出(1978 年)。
Figure 45.4 Dataland Scan-note system score printout on a large-format pen plotter (1978).
图 45.5 使用 Mockingbird 乐谱编辑器打印音乐的示例。
Figure 45.5 Example of music printing using the Mockingbird score editor.
图 45.6 Synclavier 音乐雕刻系统打印的乐谱。
Figure 45.6 Score printed by the Synclavier Music Engraving System.
图 45.7 基于图形的符号编辑器 NoteWriter 的音乐符号和示例。
Figure 45.7 Music symbols and examples of the graphics-based notation editor NoteWriter.
图 45.8 StaffPad 应用可以将屏幕上用触控笔书写的文字转换成排版的乐谱。它还可以接受语音命令,例如“添加钢琴”。
Figure 45.8 The StaffPad app turns writing with a stylus onscreen into typeset music notation. It also takes voice commands such as “add a piano.”
图 45.9 乐谱编辑器可以轻松地从总谱中提取部分。(a)总谱, G 小调交响曲, K.550,WA 莫扎特(1788 年)。(b)提取的小提琴部分。
Figure 45.9 Notation editors provide easy extraction of parts from a full score. (a) Full score, Symphony in G minor, K. 550, W. A. Mozart (1788). (b) Extracted violin part.
图 45.10 Finale 编辑器中的默认工具面板。左侧面板上的工具(从左到右)依次为:选择工具、缩放工具、手形抓取工具、五线谱工具、小节工具、调号工具、拍号工具、简易音符输入工具、快速音符输入工具、超级标注工具(用于设置实时 MIDI 输入)、连音符工具、MIDI 工具、智能形状工具(用于连线和其他标记)、表情工具、发音工具、歌词工具、和弦工具、谱号工具、重复工具、音符移动工具、调整大小工具、特殊工具(符线、词干提取、小节内音符定位)、文本工具、页面布局工具、ossia 工具和图形工具。右侧面板包含橡皮擦和各种音符选项。
Figure 45.10 Default palette of tools in the Finale editor. On the left palette are the tools (from left to right) selection tool, zoom tool, handgrabber tool, staff tool, measure tool, key signature tool, time signature tool, simple note entry tool, speedy note entry tool, hyperscribe tool (sets up real-time MIDI input), tuplet tool, MIDI tool, smart shape tool (for slurs and other markings), expression tool, articulation tool, lyrics tool, chord tool, clef tool, repeat tool, note mover tool, resize tool, special tools (beaming, stemming, note positioning in a measure), text tool, page layout tool, ossia tool, and graphics tool. The right palette includes an eraser and various note options.
图 45.11 当量化因子为三十二分音符时,一个二分音符的时长必须恰好等于十六个连音的三十二分音符,否则就会被错误地转录。该图显示了当时长仅为十五个三十二分音符时产生的转录结果。如果整首乐曲都以这种方式转录,乐谱将变得难以辨认。
Figure 45.11 When the quantization factor is a thirty-second note, a half note must last precisely as long as sixteen tied thirty-second notes, or it is transcribed incorrectly. This figure shows the transcription that results when the duration is only fifteen thirty-second notes long. If an entire performance is transcribed this way, the notation becomes unreadable.
图 45.12 费利克斯·门德尔松《六首无词歌》作品 19 号第 2 号乐谱与最终乐谱之间的差异。 (a)由记谱程序自动转录的乐谱。 (b)更正后的最终乐谱。
Figure 45.12 Difference between transcribed and final score, Felix Mendelssohn, Six Songs Without Words, Op. 19, no. 2. (a) Score as transcribed automatically by a notation program. (b) Final score as corrected.
图 46.1 非传统乐谱示例。摘自卡尔海因茨·施托克豪森 (Karlheinz Stockhausen) 1960 年的电子磁带乐谱《联络》(Kontakte )。由施托克豪森出版社( Stockhausen Verlag) 提供。http : //www.karlheinzstockhausen.org。
Figure 46.1 Example of unconventional music notation. Score excerpt from Karlheinz Stockhausen’s Kontakte (1960) for electronic tape. Courtesy Stockhausen Verlag. http://www.karlheinzstockhausen.org.
图 46.2 皮埃尔·谢弗(Pierre Schaeffer)机器辅助转录的《具象音乐的旋律》(étude of musique concrète)片段。上方曲线为深度图,即强度随时间变化的轨迹。下方部分为转录成乐谱的版本。(图片由巴黎音乐研究小组提供。)
Figure 46.2 Extract of Pierre Schaeffer’s machine-aided transcription of an étude of musique concrète. The upper curve is a Bathygram, an intensity versus-time trace. The lower portion shows a transcription into notation. (Courtesy of the Groupe de Recherches Musicales, Paris.)
图 46.3 Scriva 编辑器中的图标音色表示,显示了已被圈出的乐曲片段,以便对整个组执行操作。
Figure 46.3 Iconic timbre representation in the scriva editor, showing a fragment of the composition that has been encircled in order to perform an operation on the entire group.
图 46.4 OpenMusic 模型是一个音乐对象的容器。每个时间盒包含一个产生音乐输出的音色块。一些盒可以通过功能连接使用来自其他盒的数据。在这个例子中,和弦序列中的音高被反转并用于第二个盒。另一种类型的模型可以控制声音合成。
Figure 46.4 OpenMusic maquette is a container for musical objects. Each temporal box encloses a patch that produces a musical output. Some boxes can use data coming from other boxes by means of functional connections. In this example the pitches from a chord sequence are reversed and used in the second box. Another style of maquette can control sound synthesis.
图 46.5瓦吉奥内《24 首变奏曲》 (版本 2) 总谱的 40 秒摘录,展示了使用 IRIN 程序设计的四轨时间线。每个矩形代表一个声音片段或样本。样本在音轨内的垂直位置并不重要(即,它与音高无关)。IRIN 允许将音轨内的图形封装起来,并将它们表示为单个片段,从而允许人们按层次构建中间结构。
Figure 46.5 A 40-second excerpt of the score of Vaggione’s 24 variations (version 2), showing the four-track timeline designed with the IRIN program. Each rectangle represents a sound clip or sample. The vertical position of a sample within a track is not significant (i.e., it does not correspond to pitch). IRIN lets one encapsulate figures within a track and represents them as a single fragment, permitting one to build up mesostructure hierarchically.
图 46.6伊安尼斯·泽纳基斯 (Iannis Xenakis) 创作的声光奇观《克吕尼的波多面体》 (Polytopy de Cluny , 1972) 的仰视图。观众头顶上方可见闪光灯组成的规则网格,多束激光投射出几何图形。
Figure 46.6 View looking upward of Polytopy de Cluny (1972), a sound and light spectacle by Iannis Xenakis. A regular grid of flash bulbs is visible with projections by multiple lasers creating geometrical forms above the heads of the audience.
图46.7 克里 斯托弗·鲍德和罗伯特·亨克在柏林发电站音乐厅创作的《深网》(Deep Web)视听动能装置。该装置使用12台高精度激光器和175个移动气球矩阵,在观众上方的空间中创造出一个由线条和点组成的三维雕塑。舞蹈编排与八声道环绕声播放的乐谱同步。
Figure 46.7 Deep Web audiovisual kinetic installation by Christopher Bauder and Robert Henke installed at Kraftwerk, Berlin. Deep Web is an installation using twelve high-precision lasers and a matrix of 175 moving balloons to create a three-dimensional sculpture of lines and dots floating in space above the audience. The choreography is synced to a musical score played back in eight-channel surround sound.
图46.8 JoAnn Kuchera-Morin站在加州大学圣塔芭芭拉分校Allosphere的舰桥控制面板上,创作她2017年的作品《PROBABLY/POSSIBLY?》。该作品由Luca Peliti、Lance Putnam、Dennis Adderton、Andres Cabrera、Kon Hyong Kim、Gustavo Rincon、Joseph Tilbian、Hannah Wolfe、Tim Wood和Keehong Youn合作完成。这件作品是一个互动式、沉浸式、视觉和声音合成的作品,它追踪类氢原子电子叠加态的概率流和梯度,并根据与时间相关的薛定谔方程组合波函数。作品以声音和图像的叙事形式探索对称性及其变化。
Figure 46.8 JoAnn Kuchera-Morin on the bridge of the UCSB Allosphere at the control panel for her 2017 piece PROBABLY/POSSIBLY?, a collaboration with Luca Peliti, Lance Putnam, Dennis Adderton, Andres Cabrera, Kon Hyong Kim, Gustavo Rincon, Joseph Tilbian, Hannah Wolfe, Tim Wood, and Keehong Youn. The work is an interactive, immersive, visual and sound composition that tracks the probability currents and gradients of a hydrogen-like atom’s electron in superposition, combining wave functions according to the time-dependent Schrödinger equation. The composition explores symmetry and changes in symmetry as a narrative in sound and image.
图 46.9 胡安·曼努埃尔·埃斯卡兰特 (Juan Manuel Escalante) 的《地图的生成》 (2020)。在这个视听表演中,笔记本电脑上运行的生成算法控制着一个模块化合成器音色,如下图所示。投影的主要部分展示了一个由代码生成的网格。网格的每个单元有两种状态:活动或非活动。非活动单元显示为 X 标记,不影响声音。活动单元显示为圆圈和线条的组合,以不同的速度顺时针移动。每个活动单元控制着模块化音色中包络的触发器。投影在右侧提供了一种替代的可视化方法,将网格单元渲染为单列中的水平线。每条线都有一个小指示器,以不同的速度从左向右移动。一旦指示器到达末端,它会向合成器发送触发信号并重置自身,选择新的速度值和新的起始位置。整个系统会随机重置,每次迭代都会使用新的行、列和不同状态的配置。
Figure 46.9 The Generation of Maps (2020) by Juan Manuel Escalante. In this audiovisual performance, a generative algorithm running on a laptop computer controls a modular synthesizer patch, shown in the lower part. The main section of the projection displays a code-generated grid. Each cell of the grid has two states, active or inactive. Inactive cells appear as X marks and do not influence the sound. Active cells appear as a combination of circles and lines moving in a clockwise direction at different speeds. Each active cell controls a trigger for an envelope in the modular patch. The projection offers an alternative visualization method on the right, rendering grid cells as horizontal lines in a single column. Each line has a small indicator that moves from left to right at different speeds. Once an indicator reaches the end, it sends a trigger signal to the synthesizer and resets itself, choosing a new speed value and a new starting position. The entire system resets itself randomly with new configurations of rows, columns, and different states for each iteration.
图 46.10 John Chowning 对Stria (1977) 光谱成分的可视化。摘自 Laura Zattra (2020) 的图形分析。
Figure 46.10 Visualization of the spectral components of Stria (1977) by John Chowning. From a graphic analysis by Laura Zattra (2020).
图 46.11柯蒂斯·罗兹 (Curtis Roads) 为《半条命》(1998) 创作的配乐。该配乐由詹姆斯·英格拉姆 (James Ingram) 创作。
Figure 46.11 A page from the graphic score of Half-life (1998) by Curtis Roads. The score was realized by James Ingram.
图 46.12弗朗索瓦·拜尔 (Fran çois Bayle) 为《Rosace V》绘制 的声谱符号。由多米尼克·贝松 (Dominique Besson) 转录。(a) 声音片段的原始声谱符号。(b) 根据耳朵听到的提示,在声谱符号上刻上图形符号。(c) 最终符号符号。(图片由巴黎音乐研究小组提供。)
Figure 46.12 Acousmagraphic notation of Rosace V by François Bayle. Transcription realized by Dominique Besson. (a) Original sonogram of sound fragment. (b) Inscription of graphical symbols onto the sonogram, according to cues heard by the ear. (c) Final symbolic notation. (Courtesy of the Groupe de Recherches Musicales, Paris.)
图 46.13 声波可视化器。(a)“图层”菜单中的选项。(b)四种可视化显示。从上到下依次为:波形图、峰值频率高亮显示的频谱图、显示音高分布的色度图以及频谱质心图。
Figure 46.13 Sonic Visualiser. (a) Options in Layers menu. (b) Four visual displays. From top: waveform, spectrogram with peak frequencies highlighted, chromagram showing pitch distribution, and spectral centroid plot.
图 46.14 TIAALS 软件的屏幕截图,显示交互式超声波图和调色板。
Figure 46.14 Screenshot of TIAALS software, showing the interactive sonogram and palette.
图 46.15这张 eAnalyse 图像展示了 Pierre Couprie 对 Bernard Parmegiani 的《De Natura Sonorum 》(1975 年)中Ondes croisées 运动的分析摘录。
Figure 46.15 This eAnalyse image shows an extract of Pierre Couprie’s analysis of the movement Ondes croisées from De Natura Sonorum (1975) by Bernard Parmegiani.
图 46.16 利盖蒂的《Artikulation》, Schott Music 出版的图谱。圆圈表示未经滤波的脉冲。
Figure 46.16 Ligeti’s Artikulation, graphic score published by Schott Music. The circles indicate unfiltered impulses.
图 47.1 Roland D-50 线性合成器的 Midi Quest 编辑器。
Figure 47.1 Midi Quest editor for Roland D-50 Linear Synthesizer.
图 47.2 AudioMulch 中的颗粒合成补丁。
Figure 47.2 Granular synthesis patch in AudioMulch.
图 47.3 早期的声音合成图形用户界面,Objed 声音对象编辑器。
Figure 47.3 An early graphical user interface for sound synthesis, the Objed sound object editor.
图 47.4 用于互连信号处理模块的 MITSYN 图形编辑器。该屏幕是典型的矢量显示。
Figure 47.4 Graphical editor of MITSYN for interconnecting signal processing modules. The screen is typical of vector displays.
图 47.5 Native Instruments FM8 软件合成器的专家窗口。用户可以调整波形、包络和其他设置。右侧面板将合成电路显示为一组互连的信号发生器。该电路也可以在 Yamaha FM 架构的限制范围内进行编辑。
Figure 47.5 The Expert window of the FM8 software synthesizer by Native Instruments. Users can adjust waveforms, envelopes, and other settings. The panel on the right shows the synthesis circuit as a set of interconnected signal generators. This circuit can also be edited, within the constraints of the Yamaha FM architecture.
图 47.6 在 Max 中开发的合成补丁。
Figure 47.6 Synthesis patch developed in Max.
图 47.7 REAKTOR BLOCKS 模块化修补器。
Figure 47.7 REAKTOR BLOCKS modular patcher.
图 47.8 VCV Rack 模块化合成器中的 Hetrick 模块。
Figure 47.8 Hetrick modules in VCV Rack modular synthesizer.
图 47.9 BitWig 的模块化合成器:网格。
Figure 47.9 BitWig’s modular synthesizer: The Grid.
图 48.1 软件合成程序接收乐谱和管弦乐队文件并生成声音样本文件。
Figure 48.1 A software synthesis program takes in score and orchestra files and generates a sound sample file.
图 48.2 通过内存中的共享数据数组在软件中连接单元生成器。osc UG 的输出被读取作为 low_filter UG 的输入。
Figure 48.2 Connecting unit generators in software through shared data arrays in memory. The output of an osc UG is read as the input to a low_filter UG.
图 48.3 Orchestra 语言示例。一个带有振幅包络、振荡器和低通滤波器的乐器的定义。(a)图形表示。(b)文本表示。斜杠星号(“/*” 和 “*/”)括起来的注释。
Figure 48.3 Orchestra language example. Definition of an instrument with an envelope for amplitude, an oscillator, and a lowpass filter. (a) Graphical representation. (b) Textual representation. The remarks surrounded by slash-asterisk (“/*” and “*/”) are comments.
图 48.4 乐谱语言示例。该乐谱与图 48.3 所示的乐器相对应。乐谱由两部分组成:顶部的两个函数定义,以及底部的音符列表。
Figure 48.4 Score language example. A score corresponding to the instrument shown in figure 48.3. The score consists of two parts: the two function definitions at the top, followed by the note list at the bottom.
图 48.5 图 48.4 的函数表定义的语法。
Figure 48.5 Syntax of the function table definitions of figure 48.4.
图 48.6 控制 MIDI 设备的三种方法:通过其前面板、使用 MIDI 输入设备或使用生成 MIDI 或 OSC 消息的程序。
Figure 48.6 Three ways to control a MIDI device: via its front panel, with a MIDI input device, or with a program that generates MIDI or OSC messages.
图 49.1一个简单的 MusicXML 示例,其中包含 带有音符 C 的单小节。第一行是标题。第 2-4 行声明了一个带有分部的乐谱。DTD 代表文档类型定义,是音乐 XML 标准的一部分。<score-partwise>元素由分部组成,每个分部由小节组成。还有一个<score-timewise>选项,它由小节组成,每个小节由分部组成。接下来的几行包含一个标题,其中列出了乐谱中的不同音乐分部:一个乐谱分部、乐谱分部所需的 ID 属性以及所需的 part-name 元素。接下来,声明小节 1。列出了它的属性。然后定义了音符 C 的音高,其持续时间基于每四分音符一个等份。<type>元素告诉我们这被记为一个全音符。最后四行终止了嵌套元素的词汇范围。
Figure 49.1 A simple MusicXML example of a single measure with the note C. The first line is a header. Lines 2–4 declare a score with parts. DTD stands for document type definition, part of the Music XML standard. The <score-partwise> element is made up of parts, where each part is made up of measures. There is also a <score-timewise> option which is made up of measures, where each measure is made up of parts. The next lines consist of a header that lists the different musical parts in the score: one score-part, the required ID attribute for the score-part, and the required part-name element. Next, measure 1 is declared. Its attributes are listed. Then the pitch of the note C is defined, and its duration based on one division per quarter note. The <type> element tells us that this is notated as a whole note. The last four lines terminate the lexical scope of the nested elements.
图 49.2 J. Branciforte 的低音弦乐四重奏0123。在本作品中,使用 MaxScore(右图)实时生成乐谱,并通过网络连接在演奏者的电脑(左图)上以滚动乐谱的形式显示(Branciforte 2017)。
Figure 49.2 J. Branciforte’s 0123 for low string quartet. In this work, a score is generated in real time using MaxScore (right screen) and displayed as scrolling notation on a performer’s computer (left screen) via network connection (Branciforte 2017).
图 50.1 动画中的静态帧,描绘了作曲中的自我参照(Huang et al. 2019b)。乐谱(从左到右的钢琴卷帘记谱法)已经创作完成。此可视化展示了当前的音符模式是如何从过去的素材中衍生出来的。右侧的灰色竖线代表音乐的当下。其左侧的同心圆则展现了它对过去的延伸。灰色音符正在影响着当下。请参阅http://magenta.tensorflow.org/music-transformer。
Figure 50.1 Still frame from an animation depicting self-reference in composition (Huang et al. 2019b). The score (left-to-right piano roll notation) was already composed. This visualization shows how current note patterns were derived from past material. The gray vertical bar at right is the musical now. The concentric circles to its left visualize its reach into the past. Gray notes are currently influencing the present. Refer to htpps://magenta.tensorflow.org/music-transformer.
图 50.2 Band-in-Box (PG Music) 的屏幕图像,这是一个根据风格模板进行作曲和演奏的程序。输入一个和弦列表,该程序会根据风格模板中规范化的规则,自动生成贝斯、鼓、钢琴、吉他和弦乐的伴奏。打开的菜单列出了一些可用的风格。
Figure 50.2 Screen image of Band-in-Box (PG Music), a program for composition and performance according to style templates. Given a list of chords, this program generates automatic accompaniment for bass, drums, piano, guitar, and strings according to rules formalized in a style template. The open menu lists some of the available styles.
图 50.3 Lance Putnam 作品《漂流》(2011) 的静态图,这是一件用于多声道音频和立体投影系统的实时视听作品,包括虚拟现实和 360 °穹顶。《漂流》是为加州大学圣塔芭芭拉分校 AlloSphere 3D 沉浸式环境创作的视听作品。据艺术家介绍,这件作品的目标是让人们体验置身于一个通过统一的视觉和听觉感受体现的数学空间中的感觉。其底层算法是一个递归矩阵乘法,它可以生成一个连续的坐标序列。调整矩阵系数可以产生无穷无尽的规则和复杂图案。这些坐标在空间中被绘制成定向三角形,并用类似光线的射线按顺序连接。声音是通过沿着射线扫描并将位置信息映射到几个正弦振荡器的相位来产生的。扫描器的声音被空间化,以产生一个沉浸式声场。该作品从一个参数集插入到另一个参数集,从而产生一个不断变化的视觉和声音环境。
Figure 50.3 Still image from Adrift (2011) by Lance Putnam, a real-time audiovisual composition for multichannel audio and stereo projection systems, including virtual reality and 360° domes. Adrift is an audiovisual composition made for the UCSB AlloSphere 3D immersive environment. According to the artist, the goal of the work is to allow one to experience what it could be like to be inside a mathematical space embodied through unified visual and aural sensations. The underlying algorithm is a recursive matrix multiplication that generates a continuous sequence of coordinates. Adjusting the matrix coefficients produces an endless variety of both regular and complex patterns. The coordinates are graphed in space as oriented triangles and connected in sequence with light-like rays. Sound is generated by scanning along the rays and mapping the position information to the phases of several sine oscillators. The scanner voices are spatialized to produce an immersive sound field. The work interpolates from one parameter set to another, producing an evolving visual and sonic environment.
图 50.5莫扎特所著《音乐骰子游戏》 ( Musikalisches Würfelspiel ,美因茨 B. Schott's Söhne出版社出版) 中的数字表这首华尔兹分为两部分,分别用两个矩阵表示。矩阵八列上的罗马数字代表每首华尔兹的八个乐句,左侧行上的数字表示投掷两个骰子可能得到的值。矩阵中的数字代表乐谱另一部分乐句的数量。因此,每首华尔兹的每个乐句都需要投掷骰子八次才能组成。
Figure 50.5 Numeric tables from Musikalisches Würfelspiel (Musical Dice Game) by W. A. Mozart. (Edition by B. Schott’s Söhne, Mainz.) The waltz is in two parts, represented by the two matrices. The Roman numerals over the eight columns of the refer to the eight phrases of each waltz, while the numerals in the rows to the left indicate the possible values of two dice when thrown. The numbers in the matrix point to measure numbers of musical phrases in another part of the score. Thus one throws the dice eight times to compose each phrase of each waltz.
图 50.6 1958 年一本流行杂志上宣传的 GENIAC 电子大脑。
Figure 50.6 The GENIAC Electric Brain as advertised in a popular magazine in 1958.
图 50.7 雷蒙德·斯科特的电子素合成机,1965 年。
Figure 50.7 Raymond Scott’s Electronium composing machine, 1965.
图 50.8 为音乐会而搭建的 Sal-Mar 建筑(1971 年)。
Figure 50.8 The Sal-Mar Construction (1971) set up for a concert.
图 50.9 1959 年《科学美国人》杂志 中描述的 Lejaren Hiller 和 ILLIAC 计算机。(经 WH Freeman and Company 许可印刷。)
Figure 50.9 Lejaren Hiller and the ILLIAC computer as portrayed in Scientific American 1959. (Printed with permission of W. H. Freeman and Company.)
图 50.10 用于控制作曲过程的图形函数(源自 Mathews 和 Rosler 1969)。粗线表示声音 A 的缩放函数;细线表示声音 B 的影响。注意,A 最初占主导地位,然后 B 进入,最后 A 再次发挥影响力。
Figure 50.10 Graphical functions used to control compositional processes (after Mathews and Rosler 1969). The thick line is a scaling function for voice A; the thin line scales the influence of voice B. Notice that A dominates at first, then B enters, and then A returns to influence.
图 50.11 Iannis Xenakis 1966。盖蒂图片社。
Figure 50.11 Iannis Xenakis 1966. Getty Images.
图 50.12 Xenakis 随机音乐程序的整体逻辑。
Figure 50.12 Overall logic of Xenakis’s Stochastic Music Program.
图 50.13 1971 年,GM Koenig 在荷兰乌得勒支声学研究所数字设备公司 PDP-15 计算机的电传打字机前。
Figure 50.13 G. M. Koenig at the teletype of the Digital Equipment Corporation PDP-15 computer, Institute of Sonology, Utrecht, the Netherlands, 1971.
图 50.14 Koenig 项目 1 程序的程序流程。该程序生成七个结构,每个结构将七个选择原则 (SP) 以随机顺序应用于五个参数中的每一个。
Figure 50.14 Program flow of Koenig’s Project 1 program. The program generates seven structures, each of which applies the seven selection principles (SP) to each of the five parameters in a random order.
图 50.15 Barry Truax 讲课,1978 年。(摄影:JV III)
Figure 50.15 Barry Truax lecturing, 1978. (Photograph by J. V. III.)
图 50.16 POD 系统总体规划。作曲家指定选择原则、趋势掩码、演奏变量和声音对象定义。分配算法在作曲家指定的约束范围内按时间顺序分散声音。最终结果由计算机合成。
Figure 50.16 Overall plan of the POD system. The composer specifies the selection principles, tendency masks, performance variables, and sound object definitions. The distribution algorithm scatters sounds in time within the constraints specified by the composer. The result is synthesized by computer.
图 50.17 POD 乐谱中的趋势掩码。填充图案表示不同的音色设置。
Figure 50.17 Tendency masks in a POD score. The fill patterns indicate different timbral settings.
图 50.18 克拉伦斯·巴洛 (左) 于 1985 年向另一位算法作曲家乔治·刘易斯 (George Lewis) 解释他的算法。
Figure 50.18 Clarence Barlow (left) explaining his algorithms to another algorithmic composer, George Lewis, in 1985.
图 50.19 Barlow 的 AUTOBUSK 的屏幕图像。
Figure 50.19 Screen image of Barlow’s AUTOBUSK.
图 51.1 混合控制方案。计算机生成数字包络,并通过多路复用器路由至 DAC 的多个通道。DAC 发出的模拟信号馈送到模拟合成器模块的控制电压输入。在这里,压控振荡器 (VCO) 的音频输出馈送到压控滤波器 (VCF),后者馈送到压控放大器 (VCA)。混音器将N 个合成器声音组合成复合信号。
Figure 51.1 Hybrid control scheme. The computer generates digital envelopes that are routed via a multiplexer to several channels of DACs. The analog signals emitted by the DACs feed the control voltage inputs of the analog synthesizer modules. Here the audio output of a voltage-controlled oscillator (VCO) feeds into a voltage-controlled filter (VCF), which feeds into a voltage-controlled amplifier (VCA). A mixer combines the N synthesizer voices into a composite signal.
图 51.2 一个基本的 MIDI 端口。IN 连接器显示了标准的引脚编号。连接到 IN 端口的光隔离器由一个发光二极管组成,其光输出指向一个光电管,两者均封装在一个不透明的容器中。MIDI 信号使光脉冲打开和关闭,从而控制光电管的打开和关闭。标有 A 的三角形是一个缓冲放大器,用于在信号发送到下一个设备之前对其进行放大。Vcc表示电流源。文中解释了 UART。
Figure 51.2 A basic MIDI port. The IN connector shows the standard pin numbering. The opto-isolator connected to the IN port consists of a light-emitting diode with its light output directed at a photocell, both enclosed in an opaque container. The MIDI signal pulses the light on and off, which switches the photocell on and off. The triangle labeled A is a buffer amplifier that boosts the signal before it is sent on to the next device. Vcc indicates a current source. The UART is explained in the text.
图 51.3 使用 MIDI THRU 连接器以菊花链方式连接 MIDI 设备。(a) 从硬件音序器播放到两个合成器和一个采样器。(b) 要反转链路,即将键盘采样器录制到音序器,需要重新连接链路。中间的两个合成器不会提供额外的 MIDI 数据,但它们可能会在键盘演奏者演奏时发出声音。
Figure 51.3 Daisy chaining MIDI devices with a MIDI THRU connector. (a) Playback from a hardware sequencer to two synthesizers and a sampler. (b) To reverse the chain, that is, to record from the keyboard sampler into the sequencer, requires repatching the chain. No additional MIDI data is contributed by the two intermediate synthesizers, although they may sound as the keyboard performer plays.
图 51.4 MIDI 通道机制的逻辑(而非物理)视图。键盘输出被分成两个信息通道:1 和 2。为了录制键盘演奏,这两个通道被路由到运行音序器程序的计算机。为了收听演奏,通道 1 和 2 通过 MIDI 计算机接口路由到合成器 1。计算机控制着两个合成器和一个效果器单元,并从键盘接收数据。在此配置中,总共可以同时使用 12 个 MIDI 通道。合成器 1 是一个十声部多音色合成器,而合成器 2 有四个声部,每个效果器单元对应一个通道。
Figure 51.4 A logical (not physical) view of the MIDI channel mechanism. The keyboard output is split into two channels of information, 1 and 2. In order to record a keyboard performance these two channels are routed to the computer, which runs a sequencer program. In order to hear the performance, channels 1 and 2 are routed via the MIDI computer interface to Synthesizer 1. The computer is controlling two synthesizers and one effects unit and is taking in data from a keyboard. A total of twelve MIDI channels can be used at once in this configuration. Synthesizer 1 is a ten-voice multitimbral synthesizer, whereas Synthesizer 2 has four voices, and the effects units respond to one channel each.
图 51.5 MIDI 1.0 消息字节的格式(Lehrman 和 Tully 1993 之后)。
Figure 51.5 Format of a MIDI 1.0 message byte (after Lehrman and Tully 1993).
图 51.6 乐谱片段及对应的 MIDI 信息。(a) JS 巴赫:托卡塔,选自《第六部曲》键盘练习曲第一部分第一小节。(b) 与 (a) 对应的标准 MIDI 文件,分辨率为每四分音符 480 个刻度。时间增量表示自上一个事件以来的刻度数。Hex 表示十六进制编码。也就是说,每个四位半字节用数字或字母 0、1、2、……、 9、A、B、 …… F 表示,分别对应 0 到 15 之间的值。
Figure 51.6 Score fragment and corresponding MIDI messages. (a) J. S. Bach: Toccata from Partita VI, Clavierübung, part 1, first measure. (b) Standard MIDI file corresponding to (a) with a resolution of 480 ticks per quarter note. Delta time means number of ticks since preceding event. Hex means hexadecimal coding. That is, each four-bit nibble is indicated by a number or letter 0, 1, 2, … 9, A, B, … F corresponding to a value from 0 to 15.
图 51.8 MIDI 学习。启用 MIDI 学习后,Ableton Live 中的滤波器截止频率会映射到滑块(右上角)。启用 MIDI 学习后,用户选择要控制的参数(左上角的滤波器截止频率旋钮),然后移动硬件控制器上的滑块。这样就设置好了映射。
Figure 51.8 MIDI Learn. The filter cutoff frequency in Ableton Live is mapped to a slider (top right) by enabling MIDI Learn. With MIDI Learn switched on, the user selected the parameter to be controlled (filter cutoff frequency knob at top left) and moved a slider on the hardware controller. This sets up a mapping.
图 51.9 Roli Seaboard Block。这款 MPE 键盘对手指在琴键上的位置非常敏感,并允许单独的音高弯曲和其他富有表现力的手势。
Figure 51.9 Roli Seaboard Block. This MPE keyboard is sensitive to the position of the fingers on the keys and allows individual pitch bend and other expressive gestures.
图 51.10使用键盘 演奏 Expression E Arch é软件大提琴(用于音符选择)(左)并使用 Touch é MPE 控制器进行发音(右)。
Figure 51.10 Playing the Expression E Arché software Cello with a keyboard for note selection (left) and the Touché MPE controller for articulation (right).
图 51.11 MIDI 2.0 UMP 消息。前四位指定消息类型。
Figure 51.11 MIDI 2.0 UMP messages. The first four bits specify the message type.
图 51.12 MIDI 2.0 系统实时消息和系统通用消息的格式(上)。通道语音消息(例如音符和控制变化)的格式(下)。
Figure 51.12 Format of MIDI 2.0 system real time and system common messages (top). Format of Channel voice messages such as notes and control changes (bottom).
图 51.13 MIDI 2.0 音符开启消息的格式。
Figure 51.13 Format of MIDI 2.0 note-on message.
图 51.14 MIDI 2.0 控制器消息的格式。
Figure 51.14 Format of MIDI 2.0 controller messages.
图 51.15 MIDI 2.0 程序变化消息的格式。
Figure 51.15 Format of MIDI 2.0 program change message.
图 51.16 MIDI 2.0 图(由 MIDI 协会提供)。
Figure 51.16 MIDI 2.0 diagram (supplied by the MIDI Association).
图 52.1 OSC 消息树。
Figure 52.1 An OSC message tree.
图 52.2 用于向 Csound 发送消息的简单 PD 补丁。
Figure 52.2 A simple PD patch used to send messages to Csound.
图 A.1 两种音乐的时域波形。上图:“Kiki”。下图:“Bouba”。
Figure A.1 Time-domain waveforms of two kind of music. Top: “Kiki.” Bottom: “Bouba.”
图 A.2 从图 A.1 所示的两个示例中提取的零交叉率。上图:“Kiki”,下图:“Bouba”。
Figure A.2 Zero-crossing rates extracted from the two examples shown in figure A.1. Top: “Kiki,” Bottom: “Bouba.”
图 A.3 从训练数据集中的 200 个记录的 100 毫秒帧中提取的零交叉率直方图。
Figure A.3 Histogram of zero-crossing rates extracted from 100 ms frames of the two hundred recordings in the training data set.
表格列表
List of Tables
表 2.1 频率与周期的关系
Table 2.1 Relationship of frequency to period
表 3.1 二进制数及其十进制等价物
Table 3.1 Binary numbers and their decimal equivalents
表 4.1 测量声音强度的单位
Table 4.1 Units for measuring sound magnitude
表 4.2 振幅百分比与分贝
Table 4.2 Amplitude as a percentage versus as decibels
表 6.1 阶段索引,表格查找表
Table 6.1 Phase index, table lookup list
表 9.1 早期采样仪器
Table 9.1 Early sampling instruments
表 10.1 使用加法分析/再合成的音乐转换
Table 10.1 Musical transformations using additive analysis/resynthesis
表 13.1 高级粒度组织
Table 13.1 High-level granular organization
表 17.1 切比雪夫函数 T0 至 T8
Table 17.1 Chebychev functions T0 through T8
表 20.1 主要 FOF 参数
Table 20.1 Main FOF Parameters
表 20.2 VOSIM 参数
Table 20.2 VOSIM parameters
表 22.1 SSP 中的选择原则
Table 22.1 Selection Principles in SSP
表 25.1 有色噪声
Table 25.1 Colored noises
表 25.2 Xenakis 的随机波形合成建议
Table 25.2 Xenakis’s proposals for stochastic waveform synthesis
表 26.1 混频器输入模块的功能
Table 26.1 Functions of a mixer input module
表 32.1 声波每单位时间传播的距离及相应的波长
Table 32.1 Distance traveled by sound waves per unit of time, with the corresponding wavelength
表 32.2 空间化工具
Table 32.2 Tools for Spatialization
表33.1 混响器的典型参数
Table 33.1 Typical parameters of reverberators
表 36.1 MPEG-7 音色描述符
Table 36.1 MPEG-7 timbral descriptors
表 40.1 输入设备
Table 40.1 Input devices
表 40.2 响应式输入设备
Table 40.2 Responsive Input Devices
表 42.1 序列编辑操作
Table 42.1 Sequence editing operations
表 43.1 图形声音样本编辑器中的操作
Table 43.1 Operations in graphical sound sample editors
表 43.2 DAW 的典型特征
Table 43.2 Typical features of a DAW
表 44.1 SoundMagic Spectral 插件
Table 44.1 SoundMagic Spectral Plug-ins
表 48.1 基于文本的单元生成器合成语言
Table 48.1 Text-based unit-generator synthesis languages
表 48.2 Music 0 单元生成器的语法
Table 48.2 Syntax of Music 0 unit generators
表 49.1 过程组合语言
Table 49.1 Procedural composition languages
表 49.2 实时编码语言
Table 49.2 Live coding languages
表 50.1 算法组合的常见策略
Table 50.1 Common Strategies for Algorithmic Composition
表 51.1 MIDI 消息(部分列表)
Table 51.1 MIDI messages (partial list)
表 51.2 控制变化和模式变化
Table 51.2 Control changes and mode changes
表 51.3 MIDI 文件中的元事件
Table 51.3 Meta-events in a MIDI files
表 52.1 OSC 消息参数数据类型
Table 52.1 OSC Message Argument Data Types
随着计算机和数字设备的使用,音乐创作和制作过程与社会科技资源的交织程度比以往任何时候都更加紧密。通过广泛应用计算机来生成和处理声音,以及从微观形式到宏观形式的音乐创作,作曲家们出于创作的需要,在科学思想领域和音乐思想领域之间建立了牢固的相互依存关系。科学技术不仅丰富了当代音乐,反之亦然:在某些情况下,具有特殊音乐重要性的问题也会直接暗示或引发具有科学技术重要性的问题。音乐和科学各有其动机,它们相互依存,并由此建立了一种独特的互利关系。
With the use of computers and digital devices, the processes of music composition and its production have become intertwined with the scientific and technical resources of society to a greater extent than ever before. Through extensive application of computers in the generation and processing of sound and the composition of music from levels of the microformal to the macroformal, composers, from creative necessity, have provoked a robust interdependence between domains of scientific and musical thought. Not only have science and technology enriched contemporary music, but the converse is also true: problems of particular musical importance in some cases suggest or pose directly problems of scientific and technological importance, as well. Each having its own motivations, music and science depend on one another and in so doing define a unique relationship to their mutual benefit.
技术在音乐中的运用并非新鲜事;然而,随着计算机系统的快速发展,其应用已达到新的高度。现代计算机系统涵盖的概念远远超出了物理机器本身固有的概念。计算的显著属性之一是可编程性,因此也需要编程语言。高级编程语言代表了几个世纪以来关于思维的思考,是计算机应用于不同学科的手段。
The use of technology in music is not new; however, it has reached a new level of pertinence with the rapid development of computer systems. Modern computer systems encompass concepts that extend far beyond those that are intrinsic to the physical machines themselves. One of the distinctive attributes of computing is programmability and hence programming languages. High-level programming languages, representing centuries of thought about thinking, are the means by which computers become accessible to diverse disciplines.
编程涉及心理过程和对细节的严格关注,与作曲并无二致。因此,作曲家是第一批真正运用计算机的艺术家也就不足为奇了。将一些基本的科学知识和概念融入音乐意识,并在看似与音乐无关的领域获得能力,有着令人信服的理由。有两个理由尤其令人信服:(1) 计算机声音合成的通用性;(2) 编程在音乐结构和作曲过程中的强大功能。
Programming involves mental processes and rigorous attention to detail not unlike those involved in composition. Thus, it is not surprising that composers were the first artists to make substantive use of computers. There were compelling reasons to integrate some essential scientific knowledge and concepts into the musical consciousness and to gain competence in areas which are seemingly foreign to music. Two reasons were (and are) particularly compelling: (1) the generality of sound synthesis by computer, and (2) the power of programming in relation to the musical structure and the process of composition.
虽然传统乐器确实构成了丰富的声音空间,但作曲家们的想象力已经几十年没有尝试过通过对自然界中发现的声音进行内插和外推来创造声音,而这些声音是声学或模拟电子乐器无法实现的。由计算机控制的扬声器是现存最通用的合成媒介。任何可以通过扬声器发出的声音,从最简单的到最复杂的,都可以用这种媒介合成。计算机合成的这种通用性意味着一个异常庞大的声音空间,这对作曲家来说有着明显的吸引力。原因是计算机声音合成是连接想象和听觉的桥梁。
Although the traditional musical instruments constitute a rich sound space indeed, it has been many decades since composers’ imaginations have conjured up sounds based on the interpolation and extrapolation of those found in nature but that are not realizable with acoustical or analog electronic instruments. A loudspeaker controlled by a computer is the most general synthesis medium in existence. Any sound, from the simplest to the most complex, that can be produced through a loudspeaker can be synthesized with this medium. This generality of computer synthesis implies an extraordinarily larger sound space, which has an obvious attraction to composers. The reason is that computer sound synthesis is the bridge between that which can be imagined and that which can be heard.
尽管媒介对声音制作的限制已经消除,但作曲家要想充分发挥其潜力,仍然需要克服巨大的障碍。这个障碍就是知识的缺乏——作曲家需要能够有效地指导计算机进行合成过程的知识。在某种程度上,这些技术知识与计算机相关;这很容易获得。但它主要与声音的物理描述和感知关联有关。奇怪的是,所需的知识大多并不存在于人们最期望找到它的科学研究领域,即物理声学和心理生物学;这些学科往往在作曲家最终最关心的细节层面上提供不精确的数据,甚至没有数据。过去,人们试图复制自然声音,并利用科学数据和结论来获取有关声音的一般信息。音乐家和音乐家科学家很快指出,大多数结论和数据都是不够的。要合成出在听觉复杂度上接近最简单的自然声音的声音,需要详细了解声音各个组成部分随时间的变化。
With the elimination of constraints imposed by the medium on sound production, there nonetheless remains an enormous barrier that the composer must overcome in order to make use of this potential. That barrier is the lack of knowledge—the knowledge required for the composer to be able to effectively instruct the computer in the synthesis process. To some extent this technical knowledge relates to computers; this is rather easily acquired. But it mostly has to do with the physical description and perceptual correlates of sound. Curiously, the knowledge required does not exist, for the most part, in those areas of scientific inquiry where one would most expect to find it, that is, physical acoustics and psychobiology; these disciplines tend to provide either inexact or no data at those levels of detail with which a composer is ultimately most concerned. In the past, scientific data and conclusions were used in attempts to replicate natural sounds as a way of gaining information about sound in general. Musicians and musician-scientists were quick to point out that most of the conclusions and data were insufficient. The synthesis of sounds that approach in aural complexity the simplest natural sound demands detailed knowledge about the temporal evolution of the various components of the sound.
然而,物理学、心理学、计算机科学和数学提供了强大的工具和概念。当这些概念与音乐知识和听觉感知相结合时,音乐家、科学家和技术人员能够携手合作,创造出新的概念,并在细节层面上对声音进行物理和心理物理描述,帮助作曲家满足听觉和想象力的严格要求。
Physics, psychology, computer science, and mathematics have, however, provided powerful tools and concepts. When these concepts are integrated with musical knowledge and aural sensitivity, they allow musicians, scientists, and technicians, working together, to carve out new concepts and physical and psychophysical descriptions of sound at levels of detail that are of use to the composer in meeting the exacting requirements of the ear and imagination.
正如本书所示,已经出现了一些成果:人们对音色有了更深入的理解,作曲家的声音调色板也更加丰富新的高效合成技术已经被发现和开发,这些技术基于对声音感知属性而非物理属性的建模;强大的程序已经开发出来,用于编辑和混合合成和/或数字录制的声音;感知融合实验已经引发了声源识别和听觉图像方面新颖且具有音乐意义的研究;最后,专用计算机合成器正在设计和构建中。这些实时性能系统融合了许多知识和技术的进步。
As this book shows, some results have emerged: there is a much deeper understanding of timbre, and composers have a much richer sound palette with which to work; new efficient synthesis techniques have been discovered and developed that are based on modeling the perceptual attributes of sound rather than the physical attributes; powerful programs have been developed for the purposes of editing and mixing synthesized and/or digitally recorded sound; experiments in perceptual fusion have led to novel and musically useful research in sound source identification and auditory images; and finally, special-purpose computer-synthesizers are being designed and built. These real-time performance systems incorporate many advances in knowledge and technique.
由于设计计算机编程语言的基本假设之一是通用性,任何高级语言的实际应用范围都非常广泛,显然也包括音乐。人们用各种各样的编程语言编写了各种音乐程序。其中最有用、作曲家积累最多经验的是用于合成和处理声音的程序,以及将一段音乐的音乐规范转化为合成程序所需的物理规范的程序。
Because one of the fundamental assumptions in designing a computer programming language is generality, the range of practical applications of any given high-level language is enormous and obviously includes music. Programs have been written in a variety of programming languages for various musical purposes. Those that have been most useful and with which composers have gained the most experience are programs for the synthesis and processing of sound and programs that translate musical specifications of a piece of music into physical specifications required by the synthesis program.
掌握一定的编程能力对作曲家来说意义非凡,因为它是全面理解计算机系统的关键。尽管系统由极其复杂的程序组成,并且使用非专业人士难以掌握的技术编写而成,但编程能力能够使作曲家理解系统的整体工作原理,并达到有效使用系统所需的程度。编程能力还赋予作曲家在最需要独立性的计算层面——合成——上一定的独立性。与传统管弦乐编曲类似,在音色和微发音方面,音调的合成选择通常具有高度的主观性。作曲家能够自由地修改合成算法,从而大大增强了这一过程的效率。
The gaining of some competence at programming can be rewarding to a composer because it is the key to a general understanding of computer systems. Although systems are composed of programs of great complexity and written using techniques not easily learned by nonspecialists, programming ability enables the composer to understand the overall workings of a system to the extent required for its effective use. Programming ability also gives the composer a certain independence at those levels of computing in which independence is most desirable: synthesis. Similar to the case in traditional orchestration, the choices made in the synthesis of tones, having to do with timbre and microarticulation, are often highly subjective. The process is greatly enhanced by the ability of the composer to alter synthesis algorithms freely.
音乐结构的编程是编程能力能够提供的另一个机会。只要作曲过程能够以或多或少精确的方式表述出来,它们就可以以程序的形式实现。例如,基于某种迭代过程的音乐结构,或许可以通过编程的方式得到恰当的实现。
The programming of musical structure is another opportunity that programming competence can provide. To the extent that compositional processes can be formulated in a more or less precise manner, they may be implemented in the form of a program. A musical structure that is based upon some iterative process, for example, might be appropriately realized by means of programming.
但是,编程能力的影响并不那么明显,这种影响来自于编写者与编程语言概念的接触。而程序所要执行的功能可能会影响程序语言的选择也会影响程序功能的构思。更广义地说,编程概念可以暗示一些在编程环境之外的人可能无法想到的功能。这在音乐创作中至关重要,因为将编程概念融入音乐想象可以拓展想象力本身的边界。也就是说,语言不仅仅是一个可以用来完成某些预设任务或功能的工具;它也是一个与想象力互动的广泛的结构基础。虽然计算机声音合成涉及源自自然声音分析的物理和心理物理概念,但当与更高层次的音乐结构编程相结合时,其含义远远超出了音色。与传统乐器作曲中乐器振动模式关系在很大程度上不受作曲影响的情况不同,计算机合成允许创作音乐的微观结构。
But there is a less tangible effect of programming competence that results from the contact of the composer with the concepts of a programming language. Whereas the function that a program is to perform can influence the choice of language in which the program is written, it is also true that a programming language can influence the conception of a program’s function. In a more general sense, programming concepts can suggest functions that might not occur to one outside of the context of programming. This is of signal importance in music composition, because the integration of programming concepts into the musical imagination can extend the boundaries of the imagination itself. That is, the language is not simply a tool with which some preconceived task or function can be accomplished; it is an extensive basis of structure with which the imagination can interact, as well. Although computer synthesis of sound involves physical and psychophysical concepts derived from the analysis of natural sounds, when joined with higher-level programming of musical structure the implications extend far beyond timbre. Unlike the condition that exists in composition for traditional instruments under which the relation of vibrational modes of an instrument is largely beyond compositional influence, computer synthesis allows for the composition of music’s microstructure.
因此,在计算的语境下,音乐的微观结构并非必然具有预先确定的形式,即与特定乐器的特定演奏方式相关联。相反,它可以像作品的其他方面一样,受制于相同的思维过程,并由作曲家自由地在想象中决定。
In the context of computing, then, the microstructure of music is not necessarily of predetermined form, that is, associated with a specific articulation of a particular instrument. Rather, it can be subjected to the same thought processes and be as freely determined in the imagination of the composer as every other aspect of the work.
约翰·乔宁
John Chowning
这是《计算机音乐教程》第二版。1979年,当我们第一次向麻省理工学院出版社提交《计算机音乐教程》的书籍提案时,我是《计算机音乐杂志》的编辑。之前的三位编辑和我一起参与了这项提案。John Snell 是一位音乐硬件工程师,Curtis Abbott 和 John M. Strawn 是数字信号处理专家。当时我的研究兴趣围绕算法作曲和颗粒合成。我们的总体计划是根据每位作者的兴趣和专业知识来划分主题。然而,这个项目的进展比预期的要慢,到 1983 年,我的合伙人由于更好的机会而退出了。我面临着学习我不熟悉的研究领域。经过一段时间的全职工作后,我于 1993 年在巴黎完成了手稿。
Here is The Computer Music Tutorial, Second Edition. When we first submitted a book proposal for The Computer Music Tutorial to the MIT Press in 1979, I was editor of Computer Music Journal. Three previous editors joined me in this proposal. John Snell was a music hardware engineer, and Curtis Abbott and John M. Strawn were digital signal processing mavens. My research interests at the time revolved around algorithmic composition and granular synthesis. Our general plan had been to divide up the subject matter according to each author’s interests and expertise. However, the project went slower than expected, and by 1983, my partners had dropped out due to better opportunities. I was faced with learning about areas of research that were unfamiliar to me. After a sustained period during which I was also working full time, I finished the manuscript in Paris in 1993.
这本书的制作过程漫长,直到1996年才出版。我原计划出版三卷本,但麻省理工学院出版社却偏向于单卷。书的大小和重量虽然不影响销量,但确实是一个实际问题。书的出版范围体现了我对广度的坚持。我认为,重要的是要涵盖该领域的多个方向,而不是像某些人和机构所青睐的那样,只局限于少数几个方向。
The production process was long, and the book did not appear until 1996. I planned the original as a three-volume set; however, The MIT Press favored a single volume. The size and weight of the book was a practical problem, although it did not deter sales. The scope reflected my insistence on breadth. I felt it important to represent many directions of the field and not just a select few, as favored by certain people and institutions.
我一直对电子音乐和计算机音乐的传承很感兴趣,所以第一版的一大主线就是该领域的历史。这种历史研究本身就是一项重大的探索,贯穿了本书的始终。
I have always been interested in the legacy of electronic and computer music, so a strong thread of the first edition was the history of the field. This historical scholarship was a major endeavor in itself and runs throughout the text.
后来,在杜诺出版的法文版中,我的同事让·德·雷德莱(Jean de Reydellet)建议将本书分成多个短章节,使其更具逻辑性(《道路》1998)。我们在第二版和第三版法文版中也采用了这种组织方式(《道路》2007年、2016年)。本书也采用了类似的组织方式。
Later, for the French edition published by Dunod, my colleague Jean de Reydellet suggested a more logical organization into many short chapters (Roads 1998). We used this organization in the second and third French editions (Roads 2007, 2016). I have adopted a similar organization here.
许多章节可以独立阅读。然而,在某些情况下,先阅读其他章节会有所帮助。相关章节会提及这一点。
Many chapters can be read independently of one another. However, in certain cases it is helpful to read one before another. This is mentioned in the relevant chapters.
在本修订版中,读者将发现新增章节以及基于最新研究的大量更新。在某些情况下,有必要根据原版出版以来发生的所有事件重新构建讨论。
In this revised edition, the reader will find new chapters and also extensive updates based on recent research. In certain cases, it was necessary to reframe the discussion in the light of all that has transpired since the original edition was published.
电子音乐和计算机音乐正在迅速发展。该领域正乘着科技创新的浪潮。更快的处理器和网络、更优质的显示器以及更智能的控制器都已崭露头角。音频设备的改进推动了该领域的发展。这些都为软件创新奠定了硬件基础。如今,成千上万的公司竞相为音乐家提供产品。
Electronic and computer music is developing rapidly. The field rides waves of technological innovation. Faster processors and networks, better displays, and clever controllers have all made their mark. Improvements in audio equipment advance the field. These create a hardware foundation for innovations in software. Today, thousands of companies compete to deliver products to musicians.
然而,许多基本原理并未改变。几十年前的复古设备和精心保存的软件,在今天听起来依然美妙绝伦。支配数字信号处理的物理定律是永恒不变的。许多新产品是对古老原理的巧妙改造,并披上了全新的界面。然而,界面的力量不容小觑。新颖的设计可以彻底改变工作方式。数字音频工作站 (DAW) 及其用于编辑和混音的图形时间线界面就是这样一项创新。模块的图形化跳线是另一种创新(例如 Max),实时频谱显示的集成也是如此。模块化 Eurorack 格式激发了合成和控制领域的巨大创造力浪潮。在每一个案例中,一个新的举措都迅速传播并改变了整个领域。
Many fundamentals, however, have not changed. Vintage gear and carefully preserved software from decades past can sound wonderful today. The laws of physics that govern digital signal processing are immutable. Many new products are clever reworkings of ancient principles dressed up in fresh interfaces. Yet the power of an interface is not to be underestimated. A novel design can completely transform the method of working. The digital audio workstation (DAW) with its graphical time line interface for editing and mixing was one such innovation. Graphical patching of modules was another (e.g., Max), as was the integration of real-time spectral displays. The modular Eurorack format inspired a giant wave of creativity in synthesis and control. In each of these cases, a novel step quickly propagated and changed the field.
《计算机音乐教程》出版时,是为数不多的几本专门讨论该主题的书籍之一。此后,更多的书籍相继出版。如果我们仅查看麻省理工学院出版社自1996年以来出版的书籍目录,并将其范围缩小到计算机音乐技术书籍,我们会发现一系列扩展了本书所介绍子主题的文本:《互动音乐创作》、《Microsound》、《音频编程手册》、《CSound手册》、《超级对撞机手册》、《声音设计》、《声波交互设计》、《虚拟音乐》、《音乐认知与计算机化声音》、《音乐创造力的计算机模型》、《Musimathics 1和2》、《超越MIDI》、《音乐查询》、《音乐与概率》以及《音乐网络》。此外,还有数十本来自不同出版商的书籍。
When it was published, The Computer Music Tutorial was one of only a handful of books devoted to the subject. Many more books have since been published. If we look solely at the catalog of MIT Press books published since 1996 and narrow it to technical books on computer music, we see a range of texts that expand on subtopics introduced in this volume: Composing Interactive Music, Microsound, The Audio Programming Book, The Csound Book, The SuperCollider Book, Designing Sound, Sonic Interaction Design, Virtual Music, Music Cognition and Computerized Sound, Computer Models of Musical Creativity, Musimathics 1 and 2, Beyond MIDI, Music Query, Music and Probability, and Musical Networks. Dozens of other books are available from different publishers.
互联网深刻地改变了出版业。如果你知道该寻找什么,互联网就是一个非凡的资源。本书的目标之一就是培养这种好奇心。本书提供了约2000条参考文献,佐证了这些描述。
Publishing has been profoundly transformed by the internet. The internet is an extraordinary resource if you know what to look for. A goal of this text is to foster this curiosity. Some 2,000 references support the descriptions.
首先,本版包含了一些修正。其次,在广泛查阅文献后,内容进行了彻底更新和重写,使其更加清晰易懂。这包括数百幅新图表和数百个新参考文献。
To begin, this edition incorporates corrections. Second, after an extensive review of the literature, the content has been thoroughly updated and rewritten for greater clarity. This includes hundreds of new figures and hundreds of new references.
新章节包括虚拟模拟、脉冲星合成、连接合成、原子分解频谱分析、开放声音控制、频谱编辑器、仪器和补丁编辑器以及机器学习附录。
New chapters include virtual analog, pulsar synthesis, concatenative synthesis, spectrum analysis by atomic decomposition, Open Sound Control, spectrum editors, instrument and patch editors, and an appendix on machine learning.
新的部分涵盖 MIDI 2.0、钢琴模型、单边带调制、波形折叠、动态卷积、沉浸式声音(VBAP、环境立体声和波场合成)、侧链控制和自适应效果、基于机器学习的加法合成、多通道声音的传输格式、滤波器组和声码器、分形插值合成、扫描合成、数字音频工作站和音频中间件、实时编码和实时符号以及远程音乐。
New sections cover MIDI 2.0, piano models, single-sideband modulation, wavefolding, dynamic convolution, immersive sound (VBAP, ambisonics, and wave field synthesis), sidechain control and adaptive effects, additive synthesis based on machine learning, transmission formats for multichannel sound, filter banks and vocoders, fractal interpolation synthesis, scanned synthesis, digital audio workstations and audio middleware, live coding and live notation, and telematic music.
原版有一章介绍计算机编程,将其作为一般主题。虽然初衷是好的,但编程很难用一章来概括。许多巨著都深入细致地介绍了编程。当我考虑专门用一章来介绍音频编程时,同样的考虑也发挥了作用。现在有很多书籍专门关注这个主题。例如,《音频编程手册》(Boulanger 和 Lazzarini 2011)包含超过 3000 页的文本和数千行代码,包括 Csound、cmusic 和 Music V 的代码清单。每种音频语言都有各自的参考文本。其中包括《Csound 手册》(Boulanger 2000)、《SuperCollider 手册》(Wilson、Cottle 和 Collins 2011)、《音乐家和数字艺术家的编程:使用 ChucK 创作音乐》(Kapur 等人 2015)、《电子音乐和声音设计,第 1 卷和第 2 卷》(Max)(Cipriani 和 Giri 2010a、b)、《电子音乐的理论与技巧》(PureData)(Puckette 2007)、《为 Max/MSP 和 Pd 设计音频对象》(Lyon 2012)、《Hack Audio》(MATLAB)(Tarr 2019)和《Nyquist 参考手册》(Dannenberg 2018)等。Faust 和 AudioKit 等语言均有在线文档。有一些实用的入门教材,例如《Linux 声音编程》(Newmarch 2017)、《用 C++ 设计音频效果插件》(Pirkle 2019)和《用 C++ 设计软件合成器插件》 (Pirkle 2015)。另请参阅《生成声音与组织时间:用 gen 思考》(Wakefield 和Taylor 2022)。音频开发者大会是另一个资源,YouTube 上的音频程序员视频频道也是如此。
The original edition had an introductory chapter on computer programming as a general topic. Although it was well-intended, programming is hard to summarize in a chapter. Many tomes cover programming in depth and detail. Identical considerations came into play when I considered a chapter specifically on audio programming. By now many books focus specifically on this topic. For example, The Audio Programming Book (Boulanger and Lazzarini 2011) contains over 3,000 pages of text and thousands of lines of code, including code listings for Csound, cmusic, and Music V. Each audio language has its own reference text. Among these are The Csound Book (Boulanger 2000), The SuperCollider Book (Wilson, Cottle, and Collins 2011), Programming for Musicians and Digital Artists: Creating Music with ChucK (Kapur et al. 2015), Electronic Music and Sound Design, Volumes 1 and 2 (Max) (Cipriani and Giri 2010a, b), The Theory and Technique of Electronic Music (PureData) (Puckette 2007), Designing Audio Objects for Max/MSP and Pd (Lyon 2012), Hack Audio (MATLAB) (Tarr 2019), and Nyquist Reference Manual (Dannenberg 2018), among others. Languages such as Faust and AudioKit are documented online. Practical how-to texts such as Linux Sound Programming (Newmarch 2017), Designing Audio Effects Plugins in C++ (Pirkle 2019) and Designing Software Synthesizer Plugins in C++ (Pirkle 2015) are available. See also Generating Sound & Organizing Time: Thinking with gen~ (Wakefield and Taylor 2022). The Audio Developer Conference is another resource, as is the Audio Programmer video channel on YouTube.
因此,我们在第 8 章中对音频编程的介绍仅仅是指向一个大领域的指针。
Thus our presentation of audio programming in chapter 8 is merely a pointer to a large domain.
原版附录中介绍了心理声学,而心理声学并非计算机音乐所特有的主题。其他教材对这一领域进行了更全面的论述(Loy 2006;Howard 和 Angus 2017;Bader 2018)。
The original edition had an appendix on psychoacoustics, a topic that is not specific to computer music. Other texts offer a more comprehensive treatment of this area (Loy 2006; Howard and Angus 2017; Bader 2018).
关于傅里叶分析数学原理的旧附录已经删除,但我将基本信息合并到了第 37 章“傅里叶方法的频谱分析”中。对于那些想要更详细论述的人,Loy (2007) 用一百多页的篇幅讨论了同一主题;Smith (2011) 也提供了近六百页的篇幅。
The old appendix on the mathematics of Fourier analysis is gone, but I merged the essential information into chapter 37, “Spectrum Analysis by Fourier Methods.” For those who want a more extended treatment, Loy (2007) devotes over one hundred pages to the same subject; Smith (2011) offers nearly six hundred pages.
我遗憾地删除了过时的章节“数字信号处理器内部结构”,尤其是历史部分。20世纪七八十年代数字信号处理(DSP) 硬件的先驱们,例如 Sydney Alonso、Harold Alles、Peter Samson 和 Giuseppe Di Giugno,都是合成器工程领域的英雄。幸运的是,许多这样的故事在其他地方都有讲述。Joel Chadabe (1997) 讲述了 Synclavier 和其他几个先驱系统的传奇故事。Mark Vail (2000a, b) 回顾了模拟和数字合成的早期发展。Bj ø rn 和 Meyer (2018) 采访了几位模块化合成器开发者。Loy (2013a, b) 从历史角度讲述了 Samson Box 的故事。Giordano (2020) 记录了 Di Giugno 丰富多彩的职业生涯。
I excised the dated chapter “Internals of Digital Signal Processors” with regret, especially the historical section. The pioneers of digital signal processing (DSP) hardware in the 1970s and 1980s, such as Sydney Alonso, Harold Alles, Peter Samson, and Giuseppe Di Giugno, were heroes of synthesizer engineering. Fortunately, many of these stories are told elsewhere. Joel Chadabe (1997) tells the saga of the Synclavier and several other pioneering systems. Mark Vail (2000a, b) recounts the early days of analog and digital synthesis. Bjørn and Meyer (2018) interview several modular synthesizer developers. Loy (2013a, b) tells the story of the Samson Box from an historical perspective. Giordano (2020) documents the colorful career of Di Giugno.
如今数字音频硬件的状况截然不同。DSP 芯片广泛应用于音频效果硬件、Yamaha、Roland 等公司的合成器、Eurorack 数字模块以及 Kyma/Pacarana 和 Universal Audio Apollo 接口等独特产品中。然而,如今大多数音频软件运行在标准微处理器上,缺乏 DSP 硬件支持。因此,DSP 架构本身似乎不像 20 世纪 90 年代那样成为本书的核心内容,当时许多个人电脑都配备了专用的 DSP 电路板进行音频处理。
The situation of digital audio hardware today is quite different. DSP chips can be found in audio effects hardware, synthesizers from Yamaha, Roland, and other companies, digital Eurorack modules, and unique products such as the Kyma/Pacarana and the Universal Audio Apollo interfaces. However, most audio software today runs on standard microprocessors without DSP hardware support. Thus the topic of DSP architecture per se did not seem as central to this book as it did in the 1990s, when many personal computers had a dedicated DSP circuit board for audio processing.
讽刺的是,如今的音频计算面临着挑战,因为中央处理器(CPU) 的时钟速度早在几年前就停滞不前了 (Asanov í c et al. 2006)。短期解决方案是在 CPU 芯片上添加更多内核,但这对实时音频处理并无益处,因为多核架构并非理想之选 (Thall 2019)。有人预测,即使图形编程单元(GPU) 和张量处理单元(TPU) 也正在为音频领域部署,音频 DSP 芯片也可能会卷土重来 (Storer 2018; Anderson 2020)。然而,如果这些芯片不能成为标准,其影响力也将有限。时间会证明一切。
Ironically, present-day audio computing faces a challenge, as central processing unit (CPU) clock speeds stalled years ago (Asanovíc et al. 2006). The short-term solution, adding more cores to the CPU chip, does not benefit real-time audio processing, which is not ideally suited to multicore architectures (Thall 2019). As a solution, some have predicted that audio DSP chips could make a comeback even as graphical programming units (GPUs) and tensor processing units (TPUs) are also being deployed for audio (Storer 2018; Anderson 2020). However, if these chips do not become standard, their impact will be limited. Time will tell.
我一度觉得应该写一章关于音乐信息检索(MIR)这个广泛领域的综述。幸运的是,有一本教科书叫《音乐处理基础》(Müller , 2015)一书以教程形式涵盖了这一主题。Müller的著作描述了基于分析的应用的具体方法,本书第五部分对此进行了讨论。这些方法包括乐谱跟踪、音乐结构解析、和弦识别、节奏和节拍追踪、基于内容的音频检索、和声-打击乐分离以及旋律追踪。相关网站提供了 Python 示例代码(https://www.audiolabs-erlangen.de/news/articles/FMP),Lerch(2012 )也涵盖了相关主题。另请参阅 Polotti 和 Rocchesso(2008)的著作。
At one point I thought it would be useful to have a survey chapter on the broad area of music information retrieval (MIR). Fortunately, a textbook called Fundamentals of Music Processing (Müller 2015) appeared, which covers this topic in a tutorial manner. Müller’s book describes specific methods for analysis-based applications, discussed in part V of this book. These include score following, parsing music structure, chord recognition, tempo and beat tracking, content-based audio retrieval, harmonic-percussive separation, and melody tracking. An associated website provides example code in Python (https://www.audiolabs-erlangen.de/news/articles/FMP), and Lerch (2012) covers related subjects. Also refer to Polotti and Rocchesso (2008).
本书的某些章节提到了机械自动化,但无法全面探讨音乐机器人这一主题。机械自动化有着悠久的历史,可以追溯到18世纪雅克·德·沃康松(Jacques de Vaucanson)制造的机器人(Sousa 1906;Leichtentritt 1934;Ord-Hume 1973、1984;Buchner 1978;Kapur 2005)。虽然有些音乐机器人只不过是机械音序器(例如自动钢琴),但也有一些能够通过机器聆听与人互动。一些专门的会议和书籍,例如Solis和Ng(2011)撰写的书籍,完全专注于这一实验领域。
Certain chapters of the book mention mechanical automation, but a full-bore treatment of the topic of musical robots was not possible. Mechanical automation has a deep history dating to the age of eighteenth-century androids built by Jacques de Vaucanson (Sousa 1906; Leichtentritt 1934; Ord-Hume 1973, 1984; Buchner 1978; Kapur 2005). Although some musical robots are little more than mechanical sequencers (such as player pianos), others are capable of human interaction by means of machine listening. Specialized conferences and books such as those by Solis and Ng (2011) focus entirely on this area of experimentation.
在原版中,我并没有特别关注作曲。本书亦是如此。作曲是一个需要单独平台来探讨的话题,为此,我在我的著作《微声音》(Roads 2001a)中,以及《电子音乐创作:一种新的美学》(Roads 2015)中,都专门探讨了作曲。
In the original edition, I did not specifically focus on composition. Neither do I here. It is a topic that demands its own platform, and to this end I have devoted a part of my book Microsound (Roads 2001a) and all of Composing Electronic Music: A New Aesthetic (Roads 2015) to this end.
《计算机音乐教程(第二版)》的目标读者与原版相同:既包括音乐学生,也包括寻求计算机音乐方向的工程师和科学家。“方向”一词至关重要,标题中的“教程”一词也同样重要。
The intended audience of The Computer Music Tutorial, Second Edition is the same as the original: music students, but also engineers and scientists seeking orientation to computer music. The word orientation is key, as is the word tutorial in the title.
本书五十二章讨论的许多主题都可以扩展成一本书的长度。但本书的目标恰恰相反:筛选研究文献,梳理最基本的事实,并构建清晰简洁的技术叙述,使新手也能理解。本教程并非旨在为计算机音乐算法的高级开发人员提供全面的资源。我们的目标是介绍该领域,解释其动机,将主题置于背景中,并为进一步的研究提供参考。
Many of the topics discussed in the fifty-two chapters herein could be expanded into book-length treatises. The goal of this book was the opposite: to sift through the research literature, sort out the most fundamental facts, and craft a clear and concise technical narrative that would be understandable by a novice. This tutorial is not intended as a comprehensive resource for advanced developers of computer music algorithms. We aim to introduce the field, explain its motivations, put topics into context, and provide references for further study.
二十五年多来,我一直将《计算机音乐教程》作为我为期一年的电子音乐和计算机音乐入门课程的教材。从这段经历来看,我可以证明这本书的节奏和水平非常适合入门课程。
For more than twenty-five years I have used The Computer Music Tutorial in my year-long introductory course on electronic and computer music. Judging from this experience, I can testify that the pace and the level of this book are well matched to an introductory course.
在本书的第一版中,我谨慎地使解释具有通用性,不与特定的硬件和软件挂钩,因为这些硬件和软件可能会过时。现在,这个领域更加稳定。因此,在修订版中,我仍然保持理论的通用性,但我决定引用更多产品作为示例,以使描述更加具体。其中许多产品已经存在了二十多年。当然,尽管某些产品被作为示例提及,但本文并非旨在进行产品概述。
In the first edition of the book I was careful to make the explanations generic and not tied to specific hardware and software, which can become obsolete. The field is more stable now. Thus in this revised edition, I still keep the theory generic, but I decided to cite more products as examples in order to make the descriptions more concrete. Many of these products have been around for more than two decades. Of course, although certain products are mentioned as examples, this text is not meant to be a product survey.
一位审稿人提出了计算机音乐多样性这个棘手的问题。正如玛丽·西蒙尼 (Mary Simoni) 1995 年关于电子和计算机音乐中性别问题的调查所指出的那样,这一直是该领域的一个长期关注点。这些担忧在 Xamb ó (2018) 和 Sofer (2022) 等近期研究中也得到了呼应。其他代表性不足的群体也面临类似的问题。《计算机音乐教程》第二版主要侧重于技术研究。它反映了学术文献,而学术文献在历史上的多样性程度并不如人们所希望的那样。因此,我已明确尽可能提及并引用了多元化的贡献者。
One of the reviewers of the manuscript brought up the thorny issue of diversity in computer music. This has long been a concern in the field, as pointed by Mary Simoni’s 1995 survey of gender issues in electronic and computer music. These concerns echo in more recent studies such as Xambó (2018) and Sofer (2022). Similar issues can be raised concerning other underrepresented groups. The Computer Music Tutorial, Second Edition is focused primarily on technical research. It reflects the scholarly literature, which historically has not been as diverse as one would hope. Thus I have made an explicit effort to mention and cite diverse contributors when possible.
《计算机音乐教程》是在我开始在加州大学圣巴巴拉分校任教之前出版的。我要感谢加州大学圣巴巴拉分校电子艺术技术研究中心(CREATE)的同事们。我也非常感谢加州大学圣巴巴拉分校媒体艺术与技术(MAT)研究生项目和音乐系同事们的支持。
The Computer Music Tutorial was published immediately before I started teaching at the University of California, Santa Barbara. I would like to thank my UCSB colleagues at the Center for Research in Electronic Art Technology (CREATE). I also greatly appreciate the support from my colleagues in the Media Arts and Technology (MAT) Graduate Program and the Department of Music at UCSB.
对于这一版,我以前的学生兼同事,Bob L.T. Sturm 教授(瑞典皇家理工学院,斯德哥尔摩)欣然同意协助修订。具体来说,他修订了几个章节,并添加了两个新章节。
For this edition, my former student and research colleague, Professor Bob L. T. Sturm (KTH Royal Institute of Technology, Stockholm) kindly agreed to assist with revisions. Specifically, he revised several chapters and contributed two new chapters.
我还要感谢我以前的 CREATE 同事 Mathew J. Wright 博士(斯坦福大学音乐和声学计算机研究中心)在第 52 章“开放声音控制”上所做的合作。
I would also like to thank my former CREATE colleague Dr. Mathew J. Wright (Center for Computer Research in Music and Acoustics, Stanford) for his collaboration on chapter 52, “Open Sound Control.”
本章审阅者包括 Aaron Anderson、Clarence Barlow、Stefan Bilbao、Thom Blum、Ludger Bümmer 、 Nick Collins、Jean de Reydellet、Rodney Duplessis、Kramer Elwell、Stewart Engart、Tom Erbe、Yuan-Yi Fan、Susan Frykberg、Stefanie Ku、JoAnn Kuchera-Morin、Elizabeth Hambleton、Michael Hetrick、Francisco Iovino、Christopher Jette、Garry Kling、Lawrence Kolasa、Ryan McGee、João Pedro Oliveira、Brian O'Reilly、Robert Owens、Chris Ozley、Brandon Rolle、David Romblom、Giorgio Sancristoforo、Ron Sedgwick、Atau Tanaka、Bruce Wiggins、Michael Winter 和 Karl Yerkes。Tim Wood 博士编写了脚本来核实引文和参考文献。感谢 Federico Llach 对第 45 章中 Max 示例提供的咨询。
Chapter reviewers included Aaron Anderson, Clarence Barlow, Stefan Bilbao, Thom Blum, Ludger Brümmer, Nick Collins, Jean de Reydellet, Rodney Duplessis, Kramer Elwell, Stewart Engart, Tom Erbe, Yuan-Yi Fan, Susan Frykberg, Stefanie Ku, JoAnn Kuchera-Morin, Elizabeth Hambleton, Michael Hetrick, Francisco Iovino, Christopher Jette, Garry Kling, Lawrence Kolasa, Ryan McGee, João Pedro Oliveira, Brian O’Reilly, Robert Owens, Chris Ozley, Brandon Rolle, David Romblom, Giorgio Sancristoforo, Ron Sedgwick, Atau Tanaka, Bruce Wiggins, Michael Winter, and Karl Yerkes. Dr. Tim Wood wrote scripts to verify the citations and references. Thanks to Federico Llach for his consultation on the Max example in chapter 45.
本书包含由 Keiji Hirata、Takafumi Hikichi、James McCartney 和 Graham Hadfield 提供的对原版的更正。
This book includes corrections to the original edition kindly supplied by Keiji Hirata, Takafumi Hikichi, James McCartney, and Graham Hadfield.
我感谢东京大学的 Keiji Hirata 教授组织了 2001 年的日文版。与 Hirata 教授一起的还有翻译 Tatsuya Aoyagi、Naotoshi Osaka、Masataka Goto、Takefumi Hikichi、Saburo Hirano、Yasuo Horiuti 和 Toshiaki Matsushima。
I am grateful to Professor Keiji Hirata of the University of Tokyo for organizing the Japanese edition, which appeared in 2001. Alongside Professor Hirata were translators Tatsuya Aoyagi, Naotoshi Osaka, Masataka Goto, Takefumi Hikichi, Saburo Hirano, Yasuo Horiuti, and Toshiaki Matsushima.
我感谢加州大学圣塔芭芭拉分校媒体艺术与技术兼职教授、北京中央音乐学院教授肯·菲尔兹博士组织编写了2011年出版的中文版。我感谢中文版的译者常伟、陈阳、程一兵、胡泽、黄志鹏、姜浩、李思欣、李月玲、齐刚、杨仁英、张瑞博(Mungo)以及校对者金萍、李思欣和齐刚。
I thank Dr. Ken Fields, Adjunct Professor in Media Arts and Technology at UCSB and Professor at the Central Conservatory of Music in Beijing for organizing the Chinese edition, which appeared in 2011. I thank the translators of the Chinese edition, Chang Wei, Chen Yang, Cheng Yibing, Hu Ze, Huang Zhipeng, Jiang Hao, Li Sixin, Li Yueling, Qi Gang, Yang Renying, Zhang Ruibo (Mungo), and the proofreaders, Jin Ping, Li Sixin, and Qi Gang.
最后,我要感谢麻省理工学院出版社和威斯特彻斯特出版服务公司的团队对本书的出版所提供的帮助。
Finally, I would like to thank the teams at MIT Press and Westchester Publishing Services for their assistance with the production of this book.
音乐不断变化:新形式层出不穷,新的诠释为旧有流派注入了新鲜血液。音乐文化的浪潮交织交织,散发出新的风格共鸣。演奏和作曲的技巧也随着这些浪潮蜿蜒流淌。音乐创作的不断革新与音乐技术的持续演进息息相关。每种音乐都有其对应的乐器家族,因此,即使仅限于原声乐器,我们今天也有数百种乐器可供选择。
Music changes: new forms appear in infinite variety, and reinterpretations infuse freshness into old genres. Waves of musical cultures overlap, diffusing new stylistic resonances. Techniques for playing and composing music meander with these waves. Bound with the incessant redevelopment in music making is an ongoing evolution in music technology. For every music there is a family of instruments, so that today we have hundreds of instruments to choose from, even if we restrict ourselves to the acoustic ones.
二十世纪,电子技术将乐器设计的潮流变成了沸腾的激流。电气化将吉他、贝斯、钢琴、风琴和鼓(机器)变成了工业社会的民间乐器。模拟合成器扩展了音乐的音色库,并引发了新一轮的声音素材实验。然而,模拟合成器受限于可编程性、精度、内存和智能化的不足。凭借这些能力,数字计算机提供了一套更丰富的工具来操控音色。它能够以复杂的方式聆听、分析和响应音乐手势。它让音乐家能够根据逻辑规则编辑音乐或作曲,并将结果以乐谱的形式打印出来。它可以进行交互式教学,并用声音和图像演示音乐的各个方面。新的音乐应用不断从计算机音乐研究中衍生出来。
In the twentieth century, electronics turned the stream of instrument design into a boiling rapids. Electrification transformed the guitar, bass, piano, organ, and drum (machine) into the folk instruments of industrial society. Analog synthesizers expanded the musical sound palette and launched a round of experimentation with sound materials. But analog synthesizers were limited by a lack of programmability, precision, memory, and intelligence. By virtue of these capabilities, the digital computer provides an expanded set of brushes and implements for manipulating sound color. It can listen, analyze, and respond to musical gestures in sophisticated ways. It lets musicians edit music or compose according to logical rules and print the results in music notation. It can teach interactively and demonstrate all aspects of music with sound and images. New musical applications continue to spin out of computer music research.
随着音乐的不断变革,音乐家们面临着理解音乐媒介可能性并跟上新发展的挑战。《计算机音乐教程》满足了人们对一本涵盖计算机音乐理论与实践基础知识的标准化、综合性教材的需求。作为参考书《计算机音乐基础》(与John M. Strawn合编,麻省理工学院出版社,1985年)和《音乐机器》 (麻省理工学院出版社,1989年)的补充,本书提供了深入探索计算机音乐领域所需的基本背景知识。虽然《计算机音乐基础》和《音乐机器》是选集,但本书包含了所有面向教学目的的新内容。
In the wake of ongoing change, musicians confront the challenge of understanding the possibilities of the medium and keeping up with new developments. The Computer Music Tutorial addresses the need for a standard and comprehensive text of basic information on the theory and practice of computer music. As a complement to the reference volumes Foundations of Computer Music, (edited with John M. Strawn, MIT Press, 1985) and The Music Machine (MIT Press, 1989), this book provides the essential background necessary for advanced exploration of the computer music field. While Foundations of Computer Music and The Music Machine are anthologies, this textbook contains all new material directed toward teaching purposes.
本书的目标读者不仅是音乐学生,也包括寻求计算机音乐方向的工程师和科学家。本书的许多章节都打开了技术“黑匣子”,揭示了软件和硬件机制的内部工作原理。为什么技术信息与音乐家息息相关?我们的目标不是将音乐家变成工程师,而是让他们更好地了解音乐技术,并成为更熟练的音乐技术使用者。技术上天真的音乐家有时对这种快速发展的媒介的可能性抱有过于狭隘的概念;他们可能会将过去时代的概念限制带入一个不再有这些限制的领域。由于缺乏基本信息,他们可能会浪费时间涉猎,不知道如何将直觉转化为实际成果。因此,本书的一个目标是让许多最终将建立和管理家庭或机构计算机音乐工作室的音乐家获得独立感。
The intended audience for this book is not only music students but also engineers and scientists seeking an orientation to computer music. Many sections of this volume open technical “black boxes,” revealing the inner workings of software and hardware mechanisms. Why is technical information relevant to the musician? Our goal is not to turn musicians into engineers but to make them better informed and more skillful users of music technology. Technically naive musicians sometimes have unduly narrow concepts of the possibilities of this rapidly evolving medium; they may import conceptual limitations of bygone epochs into a domain where such restrictions no longer apply. For want of basic information, they may waste time dabbling, not knowing how to translate intuitions into practical results. Thus one aim of this book is to impart a sense of independence to the many musicians who will eventually set up and manage a home or institutional computer music studio.
对于一些音乐家来说,本文的描述可以作为专业技术研究的入门。少数人会通过新的技术进步推动该领域的发展。任何关注该领域发展的人都不会对此感到惊讶。历史一次又一次地证明,音乐技术领域的一些最重大的进步都是由精通技术的音乐家构想出来的。
For some musicians, the descriptions herein will serve as an introduction to specialized technical study. A few will push the field forward with new technical advances. This should not surprise anyone who has followed the evolution of this field. History shows time and again that some of the most significant advances in music technology have been conceived by technically informed musicians.
计算机音乐的知识基础涵盖作曲、声学、心理声学、物理学、信号处理、合成、演奏、计算机科学和电气工程等学科。因此,全面的计算机音乐教学法必须体现跨学科精神。本书以音乐应用为切入点,阐述技术概念,并在技术流程的讨论中穿插对其音乐意义的评论。
The knowledge base of computer music draws from composition, acoustics, psychoacoustics, physics, signal processing, synthesis, performance, computer science, and electrical engineering. Thus, a well-rounded pedagogy in computer music must reflect an interdisciplinary spirit. In this book, musical applications motivate the presentation of technical concepts, and the discussion of technical procedures is interspersed with commentary on their musical significance.
我们工作的目标之一是传达对计算机音乐传承的认识。概述和背景部分将当前情况置于历史背景中。大量的文献参考资料为进一步研究提供了参考资料,并突出了这些概念背后的先驱人物。
One goal of our work has been to convey an awareness of the heritage of computer music. Overview and background sections place the current picture into historical context. Myriad references to the literature point to sources for further study and also highlight the pioneers behind the concepts.
每个音乐设备和软件包都使用不同的协议集——术语、记谱系统、命令语法、按钮布局等等。这些不同的协议建立在本卷讲解的基本概念之上。鉴于众多的不兼容性和不断变化的技术环境,一本书讲解基本概念似乎比详细阐述特定语言、软件应用程序或合成器的特性更为合适。因此,本书的目的并非教读者如何操作特定的设备或软件包——这是每个系统随附文档的目标。但它将使这类学习变得更加轻松。
Every music device and software package uses a different set of protocols—terminology, notation system, command syntax, button layout, and so on. These differing protocols are built on the fundamental concepts explained in this volume. Given the myriad incompatibilities and the constantly changing technical environment, it seems more appropriate for a book to teach fundamental concepts than to spell out the idiosyncrasies of a specific language, software application, or synthesizer. Hence, this volume is not intended to teach the reader how to operate a specific device or software package—that is the goal of the documentation supplied with each system. But it will make this kind of learning much easier.
《计算机音乐教程》是一本通用教材,旨在全面客观地呈现该领域的现状。本书旨在作为核心教材,并易于适应各种教学情境。理想情况下,本书应作为一本与工作室环境相结合的读物,让学生有充足的时间尝试其中的各种理念。每个工作室都偏爱特定的工具(例如计算机、软件、合成器等),因此,这些工具的使用手册以及基于工作室的实践指导应该能够完善整个教学体系。
The Computer Music Tutorial has been written as a general textbook, aimed at presenting a balanced view the field in its current state. It is designed to serve as a core text and should be easily adaptable to a variety of teaching situations. In the ideal situation, this book should be assigned as a reader in conjunction with a studio environment where students have ample time to try out the various ideas within. Every studio favors particular tools (such as computers, software, synthesizers, etc.), so the manuals for those tools, along with studio-based practical instruction, should round out the educational equation.
尽管本书涵盖范围广泛,但作曲艺术不可能浓缩成一个整体。相反,读者会发现书中大量引用作曲家和音乐实践的案例,并与技术讨论交织在一起。第18章和第19章介绍了算法作曲背后的技术原理,但这仅仅是一个庞大(实际上是开放的)学科的一个方面,并不一定代表整个计算机音乐作曲。
Notwithstanding the broad scope of this book, it was impossible to compress the art of composition into a single part. Instead, readers will find many citations to composers and musical practices interwoven with technical discussions. Chapters 18 and 19 present the technical principles behind algorithmic composition, but this is only one facet of a vast—indeed open-ended—discipline, and is not necessarily meant to typify computer music composition as a whole.
我们在其他出版物中也调查过作曲实践。《作曲家与计算机》关注的是几位音乐家(Roads 1985a)。在我担任《计算机音乐杂志》编辑期间(1978-1989),我们发表了许多作曲评论以及对作曲家的采访和文章。其中包括一场“作曲研讨会”,其中有十四位作曲家参与。参与的作曲家(Roads 1986b),以及一期关于作曲的特刊,5(4) 1981。其中一些文章可以在一本广为流传的教材《音乐机器》(麻省理工学院出版社,1989)中找到。1987年第11(1)期专题介绍了计算机音乐作曲中的微音调。许多其他期刊和书籍也包含关于电子音乐和计算机音乐作曲问题的丰富文章。
We have surveyed composition practices in other publications. Composers and the Computer focuses on several musicians (Roads1985a). During my tenure as the editor of Computer Music Journal (1978–1989), we published many reviews of compositions as well as interviews with and articles by composers. These include a “Symposium on composition,” with fourteen composers participating (Roads 1986b), and a special issue on composition, 5(4) 1981. Some of these articles are available in a widely available text, The Music Machine (MIT Press 1989). Issue 11(1) 1987 featured microtonality in computer music composition. Many other periodicals and books contain informative articles on compositional issues in electronic and computer music.
在涵盖众多主题的教程中,提供进一步学习的参考资料至关重要。本书包含大量引文,并在书后汇编了超过 1,400 条的参考文献列表。为了进一步服务读者,我们投入了大量时间,确保名称和主题索引的全面性。
In a tutorial volume that covers many topics, it is essential to supply pointers for further study. This book contains extensive citations and a reference list of more than 1,400 entries compiled at the back of the volume. As a further service to readers, we have invested much time to ensure that both the name and subject indexes are comprehensive.
由于本教程主要面向音乐爱好者,我们选择以非正式的风格呈现技术理念。本书尽可能少地使用数学符号,并保持代码示例简洁。当需要使用数学符号时,我们会明确说明运算符、优先关系和分组,以提高可读性。这一点很重要,因为传统数学符号的惯用语有时乍一看晦涩难懂,或者作为算法描述不够完整。出于同样的原因,本书通常使用长变量名,而不是证明中常用的单字符变量名。除了一些简单的 Lisp 示例外,代码示例均采用类似 Pascal 的伪代码编写。
Since this Tutorial is addressed primarily to a musical audience, we chose to present technical ideas in an informal style. The book uses as little mathematical notation as possible. It keeps code examples brief. When mathematical notation is needed, it is presented with operators, precedence relations, and groupings specified explicitly for readability. This is important because the idioms of traditional mathematical notation are sometimes cryptic at first glance or are incomplete as algorithmic descriptions. For the same reasons, the book usually uses long variable names instead of the single-character variables favored in proofs. With the exception of a few simple Lisp examples, code examples are presented in a Pascal-like pseudocode.
一本涵盖全新领域的巨著难免会存在错误。我们欢迎指正和评论,并始终寻求更多历史信息。请将评论和更正通过电子邮件发送给作者,邮箱地址为 clangtint@gmail.com。
In a large book covering a new field, there will inevitably be errors. We welcome corrections and comments, and we are always seeking further historical information. Please send comments and corrections via email to the author at clangtint@gmail.com.
本书的写作耗时多年。初稿撰写于1980年至1986年,当时我担任麻省理工学院计算机音乐研究助理,并担任麻省理工学院出版社《计算机音乐期刊》的编辑。感谢许多朋友在随后的修订过程中给予的帮助。
This book was written over a period of many years. I wrote the first draft from 1980 to 1986, while serving as a research associate in computer music at the Massachusetts Institute of Technology and as editor of Computer Music Journal for The MIT Press. I am grateful to many friends for their assistance during the period of revisions that followed.
第四部分(“混频与信号处理”)和第五部分(“声音分析”)的主要内容是在1988年我受Aldo Piccialli教授的邀请,在那不勒斯费德里科二世大学物理系担任客座教授期间添加的。我非常感谢Piccialli教授对信号处理理论的详尽评论和慷慨建议。
Major sections of part IV (“Mixing and Signal Processing”) and part V (“Sound Analysis”) were added during a 1988 stay as visiting professor in the Department of Physics at the Università di Napoli Federico II, thanks to an invitation by Professor Aldo Piccialli. I am deeply grateful to Professor Piccialli for his detailed comments and generous counsel on the theory of signal processing.
感谢伊万·切列普宁教授(Ivan Tcherepnin),来自哈佛大学音乐系作曲专业的学生对我的第三部分(“声音合成”)提出了宝贵的反馈。我曾于1989年在哈佛大学任教。感谢康拉德·卡明斯教授和加里·尼尔森教授给予我1990年在欧柏林音乐学院任教的机会,在那里我以讲座的形式讲解了本书的大部分内容,这为写作提供了一些思路。
Valuable feedback on part III (“Sound Synthesis”) came from composition students in the Department of Music at Harvard University, where I taught in 1989, thanks to Professor Ivan Tcherepnin. I thank Professors Conrad Cummings and Gary Nelson for the opportunity to teach at the Oberlin Conservatory of Music in 1990, where I presented much of the book in lecture form, which led to clarifications in the writing.
1991年,我利用闲暇时间在东京国立音乐学院计算机音乐与音乐技术中心创作了第六部分(“音乐家的界面”),这要感谢该中心主任Cornelia Colyer、国立音乐学院院长Bin Ebisawa以及日本文化部创作的委托。本书的最终修改是在巴黎进行的。1993年和1994年,我分别在Les Ateliers UPIC(感谢Gerard Pape和Iannis Xenakis)和巴黎第八大学音乐系(感谢Horacio Vaggione教授)开设了基于本书的首批课程。
During spare moments I worked on part VI (“The Musician’s Interface”) in Tokyo at the Center for Computer Music and Music Technology, Kunitachi College of Music, in 1991, thanks to the center’s director Cornelia Colyer, Kunitachi chairman Bin Ebisawa, and a commission for a composition from the Japan Ministry of Culture. Final refinements to the book were carried out in Paris. I presented the first courses based on the completed text in 1993 and 1994 at Les Ateliers UPIC, thanks to Gerard Pape and Iannis Xenakis, and the Music Department of the University of Paris 8, thanks to Professor Horacio Vaggione.
我在《计算机音乐杂志》的前编辑同事约翰·M·斯特劳恩(John M. Strawn )多年来为这个项目做出了巨大贡献。在斯坦福大学攻读博士学位期间,他为第一部分(“数字音频”)做出了巨大贡献。后来,他以他一贯的严谨态度审阅了大部分章节的草稿。在这项漫长的科研过程中,约翰通过电子邮件咨询了无数细节。我非常感谢他分享他广博的音乐和技术知识以及敏锐的见解。
John M. Strawn, formerly my editorial colleague at Computer Music Journal, contributed substantially to this project for several years. In between his duties as a doctoral student at Stanford University, he contributed much to part I (“Digital Audio”). Later, he reviewed drafts of most chapters with characteristic thoroughness. Throughout this marathon effort, John consulted on myriad details via electronic mail. I am grateful to him for sharing his wide musical and technical knowledge and sharp wit.
许多好心人提供了信息、文献、照片,或阅读了章节草稿,给予了我很大的帮助。我由衷感谢这些慷慨的人士,他们为本书提出了无数的建议、批评和贡献:Jean-Marie Adrien、Jim Aiken、Clarence Barlow、François Bayle、James Beauchamp、Paul Berg、Nicola Bernardini、Peter Beyls、Jack比斯韦尔、汤姆·布卢姆、理查德·布朗格、大卫·布里斯托、威廉·巴克斯顿、温迪·卡洛斯、勒内·考斯、泽维尔·查伯特、约翰·乔宁、科妮莉亚·科尔耶、K·康克林、康拉德·卡明斯、詹姆斯·达肖、菲利普·德帕尔、马克·多尔森、乔瓦尼·德·波利、格哈德·埃克尔、威廉·埃尔德里奇、詹保罗·伊万杰利斯塔、艾什法曼-法尔迈安、阿德里安·弗里德、克里斯托弗·弗莱、盖伊·加内特、约翰·W·戈登、菲利普·格林斯潘、库尔特·赫贝尔、亨克扬·霍宁、戈特弗里德·迈克尔·科尼格、保罗·兰斯基、奥托·拉斯克、大卫·列文、D·加雷斯·洛伊、马克斯·V·马修斯、斯蒂芬·麦克亚当斯、丹尼斯·米勒、迭戈·明恰基、伯纳德·蒙-雷诺、罗伯特·穆格、FR·摩尔、詹姆斯·A·摩尔、彼得·奈、罗伯特·J.欧文斯、艾伦·皮弗斯、阿尔多·皮恰利、史蒂芬·波普、爱德华·普林、米勒·帕克特、托马斯·雷亚、让-克洛德·里塞特、克雷格·路兹、泽维尔·罗代特、约瑟夫·罗斯斯坦、威廉·肖特施塔特、玛丽-埃莱·内·塞拉、约翰·斯内尔、约翰·斯托纳、莫顿·苏波尼克、玛莎·斯维佐夫、凯伦·田中、斯坦·坦佩拉尔斯、丹尼尔特鲁吉、艾尔内·萨诺斯、巴里·特鲁克斯、阿尔维斯·维多林、迪恩·瓦尔拉夫、大卫·韦克斯曼、埃尔林·沃尔德和伊安尼斯·泽纳基斯。
Many kind individuals helped by supplying information, documentation, and photographs or by reading chapter drafts. I am profoundly indebted to these generous people for their myriad suggestions, criticisms, and contributions to this book: Jean-Marie Adrien, Jim Aiken, Clarence Barlow, François Bayle, James Beauchamp, Paul Berg, Nicola Bernardini, Peter Beyls, Jack Biswell, Thom Blum, Richard Boulanger, David Bristow, William Buxton, Wendy Carlos, René Caussé, Xavier Chabot, John Chowning, Cornelia Colyer, K. Conklin, Conrad Cummings, James Dashow, Philippe Depalle, Mark Dolson, Giovanni De Poli, Gerhard Eckel, William Eldridge, Gianpaolo Evangelista, Ayshe Farman-Farmaian, Adrian Freed, Christopher Fry, Guy Garnett, John W. Gordon, Philip Greenspun, Kurt Hebel, Henkjan Honing, Gottfried Michael Koenig, Paul Lansky, Otto Laske, David Lewin, D. Gareth Loy, Max V. Mathews, Stephen McAdams, Dennis Miller, Diego Minciacchi, Bernard Mont-Reynaud, Robert Moog, F. R. Moore, James A. Moorer, Peter Nye, Robert J. Owens, Alan Peevers, Aldo Piccialli, Stephen Pope, Edward L. Poulin, Miller Puckette, Thomas Rhea, Jean-Claude Risset, Craig Roads, Xavier Rodet, Joseph Rothstein, William Schottstaedt, Marie-Hélène Serra, John Snell, John Stautner, Morton Subotnick, Martha Swetzoff, Karen Tanaka, Stan Tempelaars, Daniel Teruggi, Irène Thanos, Barry Truax, Alvise Vidolin, Dean Wallraff, David Waxman, Erling Wold, and Iannis Xenakis.
我还要感谢麻省理工学院出版社期刊部的全体员工—— 《计算机音乐期刊》的出版经理珍妮特·费舍尔。如果没有他们过去十四年的支持,这项工作几乎不可能完成。
I would also like to express my thanks to the staff of The MIT Press Journals—Janet Fisher, manager—publishers of Computer Music Journal. This work would have been nigh impossible without their backing over the past fourteen years.
我将永远感谢麻省理工学院出版社社长弗兰克·厄班诺夫斯基 (Frank Urbanowski) 和执行编辑特里·埃林 (Terry Ehling) 对这个项目给予的极其耐心和善意的支持。
I will always be grateful to Frank Urbanowski, director of The MIT Press, and executive editor Terry Ehling for their extraordinarily patient and kind support of this project.
我还要感谢桑德拉·明基宁 (Sandra Minkkinen) 和麻省理工学院出版社的制作人员的精心编辑和制作工作。
I am also indebted to Sandra Minkkinen and the production staff of The MIT Press for their fine editing and production labors.
谨以此书献给我的母亲玛乔丽·罗兹 (Marjorie Roads)。
This book is dedicated to my mother, Marjorie Roads.
History of Analog Audio Recording
Experimental Digital Audio Recording
Downloadable Audio File Formats
Origins of Digital Multitrack Recording
本章对数字音频录制和播放的历史和技术进行了基本介绍。
This chapter presents a basic introduction to the history and technology of digital audio recording and playback.
录音有着悠久的历史,始于 1857 年斯科特 (Scott) 的留声机 (Lé ),它只能录制声音波形,但不能播放。19 世纪 70 年代,托马斯·爱迪生 (Thomas Edison) 的实验将录音与蜡筒上的播放结合起来。埃米尔·伯林纳 (Emile Berliner) 的留声机 (1887 年) 在旋转的唱片上录音,这是至今仍在生产的长播放 (LP) 黑胶唱片的前身 (Read and Welch 1976)。早期的录音是一个机械过程 (图 1.1 )。空气振动引起膜振动,这些振动通过连接到膜上的唱针在旋转的蜡筒等柔软介质中追踪。
Sound recording has a rich history, beginning with Léon Scott’s phonautograph of 1857, which could record a sound waveform but not play it back. Thomas Edison’s experiments in the 1870s combined sound recording with playback on wax cylinders. Emile Berliner’s gramophone (1887) recorded on rotating discs, the precursor to the long-play (LP) vinyl discs still being made (Read and Welch 1976). Early audio recording was a mechanical process (figure 1.1). Air vibrations caused a membrane to vibrate, and these vibrations were traced in a soft medium such as a rotating wax cylinder by a stylus attached to the membrane.
图 1.1 1900 年以前的机械录音会议。钢琴上方的大锥体拾取的声音振动被转换成切割针刺穿旋转蜡筒的振动。
Figure 1.1 Mechanical recording session before 1900. Sound vibrations picked up by the large cone over the piano were transduced into vibrations of a cutting stylus piercing a rotating wax cylinder.
尽管1906年三极管的发明开启了电子时代,但电子录音直到1924年才得以实用(Keller 1981)。图1.2展示了20世纪20年代常见的一种号角式扬声器。
Although the invention of the triode vacuum tube in 1906 launched the era of electronics, electronically produced recordings did not become practical until 1924 (Keller 1981). Figure 1.2 depicts one of the horn-loaded loudspeakers that were common in the 1920s.
图 1.2 Amplion 扬声器,1925 年广告宣传。
Figure 1.2 Amplion loudspeaker, as advertised in 1925.
胶片上的光学录音技术首次演示于1922年(Ristow 1993)。20世纪30年代,德国开发了用涂有磁粉的磁带录音技术(图1.3),但直到二战后才传播到世界各地。德国磁录音机比以前的线带和钢带录音机有了很大的进步,因为以前的线带和钢带录音机需要硬焊接或焊接才能进行接合。
Optical sound recording on film was first demonstrated in 1922 (Ristow 1993). Sound recording on tape coated with powdered magnetized material was developed in the 1930s in Germany (figure 1.3) but did not reach the rest of the world until after World War II. The German magnetophon tape recorders were a great advance over previous wire and steel band recorders, which required hard soldering or welding to make a splice.
图 1.3 AEG 公司于 1935 年制造的便携式磁带录音机原型。(图片由 BASF Aktiengesellschaft 提供。)
Figure 1.3 Prototype of a portable magnetophon tape recorder from 1935, made by AEG. (Photograph courtesy of BASF Aktiengesellschaft.)
磁声机及其后继者都是模拟录音机,之所以这样称呼,是因为磁带上编码的波形与麦克风拾取的原始声波波形非常相似。如果我们能够看到磁带上的磁化粒子,它们就会形成类似于声波的图案。模拟录音因其特殊的音质仍然受到一些人的青睐。然而,模拟录音面临着基本的物理限制。这些限制在从一种模拟介质复制到另一种模拟介质时最为明显——额外的噪声是不可避免的。
The magnetophons and their descendants were analog recorders, so called because the waveform encoded on tape is a close analogy to the original sound waveform picked up by a microphone. If we could view the magnetized particles on the tape, they would form a pattern resembling the sound waveform. Analog recording is still favored by some for its special quality of sound. However, analog recording faces fundamental physical limits. These limits are most apparent in copying from one analog medium to another—additional noise is inescapable.
有关模拟录音历史的更多信息,请参阅第 26 章混音部分。
For more on the history of analog recording, see chapter 26 on mixing.
数字音频录音的核心概念是采样,即将连续的模拟信号(例如来自麦克风的信号)转换为离散的时间采样信号(样本)。每个样本只不过是一个数字——声音波形的快照(图 1.4)。
The core concept in digital audio recording is sampling, that is, converting continuous analog signals (such as those coming from a microphone) into discrete time-sampled signals (samples). Each sample is nothing more than a number—a snapshot of a sound waveform (figure 1.4).
图 1.4 放大显示声音编辑器中的各个样本。声音编辑器在样本间画了一条线以增强显示效果。所有样本的振幅均为正值;中间的线表示振幅为 0。显示的时间跨度约为 700 µs(小于千分之一秒)。
Figure 1.4 Zoomed in to show individual samples as they appear in a sound editor. The sound editor drew a line through them to enhance the display. The samples are all positive in amplitude; the line in the center represents 0 amplitude. The time span shown is about 700 µs (less than a thousandth of a second).
采样的理论基础是采样定理,该定理规定了采样率和音频带宽之间的关系。它也被称为奈奎斯特定理,以贝尔电话实验室的 Harold Nyquist (Nyquist 1928) 的工作命名,但该定理的另一种形式最早由法国数学家 Augustin Louis Cauchy (1789–1857) 于 1841 年提出。英国研究员 A. Reeves 开发了第一个获得专利的脉冲编码调制(PCM) 系统,用于以幅度二分、时间量化(数字)形式传输消息(Reeves 1938;Licklider 1950;Black 1953)。即使在今天,数字录音有时也被称为PCM 录音。信息论的发展促进了对数字音频传输的理解(Shannon 1948)。解决模拟信号和数字信号之间转换的难题花费了数十年时间,并且仍在不断改进。第 3 章和第 4 章描述了转换过程。
The theoretical underpinning of sampling is the sampling theorem, which specifies the relation between the sampling rate and the audio bandwidth. It is also called the Nyquist theorem after the work of Harold Nyquist of Bell Telephone Laboratories (Nyquist 1928), but another form of this theorem was first stated in 1841 by the French mathematician Augustin Louis Cauchy (1789–1857). The British researcher A. Reeves developed the first patented pulse-code-modulation (PCM) system for transmission of messages in amplitude-dichotomized, time-quantized (digital) form (Reeves 1938; Licklider 1950; Black 1953). Even today, digital recording is sometimes called PCM recording. The development of information theory contributed to the understanding of digital audio transmission (Shannon 1948). Solving the difficult problems of converting between analog signals and digital signals took decades and is still being improved. Chapters 3 and 4 describe the conversion processes.
原始的发声方法可以追溯到电子计算的早期阶段(Doornbusch 2005),例如将收音机放在计算机旁边,然后运行以音频循环播放的程序来产生旋律。这些实验的音质非常有限。
Primitive methods of generating sounds, such as placing a radio next to a computer and running programs that loop at audio frequencies to create melodies, date to the early days of electronic computing (Doornbusch 2005). These experiments were very limited in their sound quality.
20 世纪 50 年代末,贝尔电话实验室的马克斯·马修斯 (Max Mathews) 和他的团队利用数字计算机生成了第一批基于样本的声音 (David, Mathews, and McDonald 1959)。使用样本来表示波形可以生成任何可能的波形。计算机将样本写入昂贵且笨重的盘式计算机磁带存储驱动器。从数字生成声音是一个单独的过程,即通过 Epsco 公司开发的定制 12 位真空管数声转换器播放磁带(Roads 1980)。今天,我们将将数字转换为模拟电压的播放设备称为数模转换器( DAC)。相比之下,模数转换器(ADC) 将模拟波形编码为数字。
In the late 1950s, Max Mathews and his group at Bell Telephone Laboratories generated the first sample-based sounds from a digital computer (David, Mathews, and McDonald 1959). Using samples to represent waveforms allowed generating any possible waveshape. The samples were written by the computer to expensive and bulky reel-to-reel computer tape storage drives. The production of sound from the numbers was a separate process of playing back the tape through a custom-built 12-bit vacuum tube digital-to-sound converter developed the Epsco Corporation (Roads 1980). Today we call a playback device that translates from digital numbers into analog voltages a digital-to-analog converter (DAC). By contrast, an analog-to-digital converter (ADC) encodes an analog waveform into digital numbers.
实用的数字音频记录介质需要具备鲁棒性,即在发生错误时能够恢复。错误检测和纠正方法在 20 世纪 50 年代和 60 年代得到了发展,并广泛应用于通信领域。后来,Sato、Blesser、Stockham 和 Doi 在首批实用的数字音频录音机和播放器中应用了纠错技术。日本广播公司 NHK 演示了第一台专用的单通道数字音频录音机(基于录像带机制)(Nakajima 等人,1983 年)。此后不久,Denon 开发了改进版本(图 1.5),数字音频录音机的市场竞争由此拉开帷幕(Iwamura 等人,1973 年)。
A practical digital audio recording medium needs to be robust, that is, resilient when errors occur. Methods for error detection and correction were developed in the 1950s and 1960s and widely used in communications. Later, Sato, Blesser, Stockham, and Doi applied error correction in the first practical digital audio recorders and players. The first dedicated one-channel digital audio recorder (based on a videotape mechanism) was demonstrated by NHK, the Japanese broadcasting company (Nakajima et al. 1983). Soon thereafter, Denon developed an improved version (figure 1.5), and the race began to bring digital audio recorders to market (Iwamura et al. 1973).
图 1.5 日本哥伦比亚(Denon)于 1973 年制造的数字录音机,基于 1 英寸录像机(右)。
Figure 1.5 Nippon Columbia (Denon) digital audio recorder made in 1973, based on a 1-inch videotape recorder (right).
1977年,第一款商用录音系统索尼PCM-1问世,旨在将13位数字音频信号编码到索尼录像机上。不到一年,该系统就被16位PCM编码器所取代,例如面向专业录音市场的售价4万美元的索尼PCM-1600(Nakajima等人,1978年)。
By 1977 the first commercial recording system appeared, the Sony PCM-1, designed to encode 13-bit digital audio signals onto Sony videocassette recorders. Within a year this was displaced by 16-bit PCM encoders such as the $40,000 Sony PCM-1600 for the professional recording market (Nakajima et al. 1978).
音频工程协会 (AEDS) 在 20 世纪 80 年代制定了两种标准采样频率:44.1 kHz 和 48 kHz。该标准在获得欧洲广播联盟 (European Broadcast Union) 认可后,被称为 AES/EBU 或 AES3 标准。此后,该标准陆续修订,纳入了更高的采样率,例如 88.2 kHz、96 kHz、176.4 kHz 和 192 kHz 等 (AEDS,2008)。如今,市面上出现了各种高分辨率录音机,包括内置麦克风的手持式现场录音机。
The Audio Engineering Society established two standard sampling frequencies in the 1980s: 44.1 and 48 kHz. This became known as the AES/EBU or AES3 standard after it was endorsed by the European Broadcast Union. Since then, the standard has been successively revised to incorporate the higher sampling rates of 88.2, 96, 176.4, and 192 kHz, and so on (Audio Engineering Society 2008). A variety of high-resolution recorders are now available, including hand-held field recorders with built-in microphones.
1980年,计算机音频研究尚处于起步阶段,当时消费级录音音乐的标准格式是1948年推出的黑胶长放唱片(LP)。除了少数几张发烧级LP唱片外,唱片公司在20世纪70年代和80年代生产的大多数LP唱片,由于母带处理和制作工艺的不足,音质都不太出色。因此,市场亟需一种改进的格式。
In 1980, computer sound research was still in its infancy, and the standard consumer format for recorded music was the vinyl long-play (LP) record, introduced in 1948. Apart from a small number of audiophile-quality LPs, the majority of LPs manufactured by record companies in the 1970s and 1980s were not of outstanding quality, due to compromised mastering and manufacturing practices. Thus, the market was ripe for an improved format.
数字声音于1982年首次通过光盘(CD)格式(一种由激光读取的12厘米光盘)进入大众视野。CD格式由飞利浦和索尼两家公司经过多年研发共同开发。它取得了巨大的商业成功,两年内售出了超过135万台播放机和数千万张光盘(Pohlman 1989a)。CD的成功部分归功于计算机行业将其作为分发软件和存储数据的通用手段——CD-ROM。
Digital sound first reached the general public in 1982 by means of the compact disc (CD) format, a 12 cm optical disc read by a laser. The CD format was developed jointly by the Philips and Sony corporations after years of development. It was a tremendous commercial success, selling over 1.35 million players and tens of millions of discs within two years (Pohlman 1989a). Part of the success of the CD can be traced to its adoption by the computer industry as a general means of distributing software and storing data, the CD read-only memory (CD-ROM).
2000年推出的专业DVD音频(DVD-A)和超级音频光盘(SACD)光盘格式提供了更佳的音质,但这两种格式都未能在市场上站稳脚跟。蓝光光盘格式于2002年首次推出,其物理尺寸与CD和DVD相同。蓝光支持高清3D视频、高达128GB的数据容量以及高达八声道的高分辨率数字音频(最高24位/样本,采样率为96kHz)。蓝光作为一种纯音频格式于2013年发布,但并未取得成功。
Introduced in 2000, the specialized DVD-audio (DVD-A) and super audio compact disc (SACD) optical disc formats offered improved sound quality, but neither format gained a foothold in the market. The Blu-ray disc format, first introduced in 2002, is same physical size as the CD and DVD. Blu-ray supports high-definition 3D video, up to 128 gigabytes of data and up to eight channels of high-resolution digital audio (up to 24 bits per sample at a 96 kHz sampling rate). Blu-ray as an audio-only format was announced in 2013 but was not successful.
互联网的兴起促进了可下载音频文件格式的发展。在整个20世纪90年代,大多数互联网用户依赖于基于电话调制解调器的慢速连接,传输速度缓慢。为了应对这些限制,流行的MP3媒体应运而生,其官方名称是MPEG音频层III,这是一个国际标准。MP3使用有损压缩算法来大幅减小声音文件的大小(Pohlmann 2005)。为了减少数据量,音频信息在编码阶段会被丢弃。这会导致音频保真度的损失。
The rise of the internet fostered the evolution of downloadable audio file formats. Throughout the 1990s, most internet users relied on slow connections based on telephone modems with sluggish transmission speeds. As a response to these limitations, the popular MP3 medium was developed under the official moniker MPEG Audio Layer III, an international standard. MP3 uses a lossy compression algorithm to drastically reduce the size of a sound file (Pohlmann 2005). To reduce the amount of data, audio information is discarded in the encoding phase. This results in a loss of audio fidelity.
相比之下,无损压缩可以减少文件的大小,以便于存储和传输,同时允许重建原始文件没错。免费无损音频编解码器 (FLAC) 就是无损压缩格式的一个例子。使用 FLAC 算法压缩的数字音频文件大小可以减少约 50% (Xiph 2015)。
By contrast, lossless compression reduces the size of the file for the purpose of storage and transmission while allowing the original file to be reconstructed exactly. The free lossless audio codec (FLAC) is an example of a lossless compression format. Digital audio files compressed by FLAC’s algorithm can be reduced in size by around 50 percent (Xiph 2015).
新的媒体格式不断涌现,部分原因是出于商业动机引入专有系统。有关音频文件格式的更多信息,请参阅第四章。
New media formats are constantly appearing, partly because there are commercial incentives to introduce proprietary systems. See chapter 4 for more on audio file formats.
1988年左右,连接个人电脑的高品质DAC问世。与此同时,音频文件的标准格式也应运而生,包括Sound Designer II(SDII)、音频交换文件格式(AIFF)和波形音频格式(WAVE)。这些发展预示着计算机音乐新时代的到来。在短时间内,个人电脑的声音合成、录音和处理功能便得到了广泛普及。许多不同的数字音频工作站(DAW)涌入音乐市场,包括Pro Tools、Cubase、Digital Performer、Logic等。DAW允许音乐家将音乐录制到连接到个人电脑的硬盘上。这些音乐可以通过电脑屏幕上的时间线显示进行精确编辑,并可从硬盘上播放。
High-quality DACs attached to personal computers came on the scene around 1988. At the same time, standard formats for audio files appeared, including Sound Designer II (SDII), audio interchange file format (AIFF), and waveform audio format (WAVE). These developments heralded a major new era for computer music. In a short period, sound synthesis, recording, and processing by personal computer became widespread. Many different digital audio workstations (DAWs) reached the musical marketplace, including Pro Tools, Cubase, Digital Performer, Logic, and others. DAWs let musicians record music onto a hard disk connected to a personal computer. This music could be precisely edited using a timeline display on the computer screen with playback from the hard disk.
多轨录音机拥有多个独立的通道或音轨,可以在不同时间进行录制。例如,每个音轨可以录制不同的乐器,从而为后期混音提供灵活性。多轨录音机的另一个优势是,它允许音乐家以多层形式构建录音;每个新层都是先前录制层的伴奏。
Multitrack recorders have a number of discrete channels or tracks that can be recorded at different times. Each track can record, for example, a separate instrument, which allows flexibility when the tracks are later mixed together. Another advantage of multitrack machines is that they let musicians build recordings in several layers; each new layer is an accompaniment to previously recorded layers.
英国广播公司 (BBC) 于 1976 年开发了一台实验性的 10 通道数字磁带录音机。两年后,3M 公司推出了首台商用 32 轨数字录音机(图 1.6)以及一台简易的数字磁带编辑器(Duffy,1982)。第一台基于计算机磁盘的随机存取声音编辑器和混音器由犹他州盐湖城的 Soundstream 公司开发(见图 16.38)。他们的开创性系统允许一次混音多达 8 个存储在计算机磁盘上的音轨或声音文件(Ingebretsen 和 Stockham,1984)。
The British Broadcasting Company (BBC) developed an experimental 10-channel digital tape recorder in 1976. Two years later, the 3M company introduced the first commercial 32-track digital recorder (figure 1.6) as well as a rudimentary digital tape editor (Duffy 1982). The first computer disk–based random-access sound editor and mixer was developed by the Soundstream company in Salt Lake City, Utah (see figure 16.38). Their pioneering system allowed mixing of up to eight tracks or sound files stored on computer disk at a time (Ingebretsen and Stockham 1984).
图 1.6 3M 32 轨数字磁带录音机,于 1978 年推出。
Figure 1.6 3M 32-track digital tape recorder, introduced in 1978.
早期的数字多轨录音是一项非常昂贵的事业。Studer 数字录音机(图 1.7)在 1991 年的售价为 27 万美元。随后,在短时间内,软件DAW取代了大多数录音棚里的磁带录音机。人们可以在便携式笔记本电脑上进行录音、编辑和混音。
Early digital multitrack recording was a very expensive enterprise. The Studer digital recorder (figure 1.7) sold for $270,000 in 1991. Then within a short time, software DAWs replaced tape recorders in most studios. It became possible to record, edit, and mix on portable laptop computers.
图 1.7 Studer D820-48 DASH 数字多轨录音机,于 1991 年推出,零售价约为 270,000 美元。制作磁带备份需要使用两台机器。
Figure 1.7 Studer D820-48 DASH digital multitrack recorder, introduced in 1991 with a retail price of about $270,000. To make a backup copy of the tape required the use of two machines.
带有内置麦克风的高质量数字现场录音机开始流行(图 1.8)。
High-quality digital field recorders with built-in microphones became popular (figure 1.8).
图1.8 索尼PCM-D100现场录音机。
Figure 1.8 Sony PCM-D100 field recorder.
某种程度上,便携式现场录音机的功能可以通过配备音频接口和高品质立体声麦克风的手机实现。移动设备的缺点是更新换代速度很快,这会影响到与其连接的任何设备。
To some extent, the functionality of portable field recorders can be achieved on mobile phones with accessories for an audio interface and high-quality stereo microphones. The downside of mobile devices is rapid obsolescence, which afflicts anything connected to them.
录音艺术需要的不仅仅是合适的设备。认真学习录音的学生可以通过学徒期学到很多东西。许多学校都提供四年制录音工程“音质大师”(Tonmeister)学位。“音质大师”不仅学习音乐,还学习应用物理学。他们学习房间声学(声波的反射、吸收和衍射)、乐器、麦克风类型、麦克风技术以及音频媒体制作方法。
The art of recording requires more than proper equipment. Serious students of recording have much to learn through apprenticeship. A number of schools offer four-year Tonmeister degrees in recording engineering. Tonmeisters study music as well as applied physics. They learn about the acoustics of rooms (reflection, absorption, and diffraction of sound waves) and instruments, microphone types, microphone techniques, and audio media production methods.
Frequency-Domain Representation
本章介绍描述声音信号的基本概念和术语,包括频率、幅度和相位。
This chapter introduces basic concepts and terminology for describing sound signals, including frequency, amplitude, and phase.
声音从声源经空气传播到达听者的耳朵。我们之所以能听到声音,是因为耳内气压发生轻微变化,导致耳膜振动。如果气压按照重复的模式变化,我们就说声音具有周期性波形。如果没有可辨别的模式,则称为噪声。在这两个极端之间,存在着大量的准周期声音和准噪声。
Sound reaches listeners’ ears after being transmitted through air from a source. We hear sound because the air pressure is changing slightly in our ears, causing the eardrum to vibrate. If the pressure varies according to a repeating pattern, we say that the sound has a periodic waveform. If there is no discernible pattern it is called noise. In between these two extremes is a vast domain of quasiperiodic and quasinoisy sounds.
周期波形的一次重复称为一个周期,波形的基频是每秒发生的周期数。在本书的其余部分,我们将根据标准声学术语,用赫兹 (Hz) 代替每秒周期数。(赫兹是赫兹 (hertz) 的缩写,赫兹是以德国声学家海因里希·赫兹 (Heinrich Hertz) 命名的计量单位。)
One repetition of a periodic waveform is called a cycle, and the fundamental frequency of the waveform is the number of cycles that occur per second. In the rest of this book we substitute Hz for cycles per second in accordance with standard acoustical terminology. (Hz is an abbreviation for hertz, the unit of measurement named after the German acoustician Heinrich Hertz.)
随着周期的长度(称为周期)减小,每秒的频率(以周期为单位)增加,反之亦然。表 2.1显示了频率和周期之间的关系。
As the length of the cycle (called the period) decreases, the frequency in cycles per second increases, and vice versa. Table 2.1 shows the relationship between frequency and period.
|
表 2.1 频率与周期的关系 Table 2.1 Relationship of frequency to period |
||
|---|---|---|
|
频率 Frequency |
时期 Period |
|
|
1赫兹 1 Hz |
1秒 1 second (s) |
|
|
10赫兹 10 Hz |
0.1 秒或 100 毫秒 (ms) 0.1 s or 100 milliseconds (ms) |
|
|
100赫兹 100 Hz |
0.01秒或10毫秒 0.01 s or 10 ms |
|
|
1000赫兹 1000 Hz |
0.001秒或1毫秒 0.001 s or 1 ms |
|
|
10000赫兹 10000 Hz |
0.0001 秒或 100 微秒 (µs) 0.0001 s or 100 microseconds (µs) |
|
另一个描述性术语是波长,它是周期之间物理距离的度量。由于声音在20摄氏度时传播速度约为343米/秒,因此1赫兹的波展开的距离约为343米,而20千赫兹的波展开的距离约为0.017米,即约1.7厘米。
Another descriptive term is wavelength, which is the measure of the physical distance between periods. Because sound travels at about 343 m/s at 20° Celsius, a wave at 1 Hz unfolds over about 343 m, whereas a wave at 20 kHz unfolds over about 0.017 m or about 1.7 cm.
描述声波波形的一个简单方法是将其绘制成气压与时间的关系图(图 2.1)。这种压力图称为时域表示。当曲线靠近图的底部时,气压较低;当曲线靠近图的顶部时,气压较高。图中显示气压有所升高。波形的振幅表示气压的变化量;我们可以用从零气压点到给定波形段最高点(或最低点)的垂直距离来测量振幅。
A simple method of depicting sound waveforms is to draw them in the form of a graph of air pressure versus time (figure 2.1). This pressure graph is called a time-domain representation. When the curved line is near the bottom of the graph, the air pressure is lower; when the curve is near the top of the graph, the air pressure has increased. The amplitude of the waveform is the amount of air pressure change; we can measure amplitude as the vertical distance from the zero pressure point to the highest (or lowest) points of a given waveform segment.
图 2.1 信号的时域表示。垂直方向表示气压。曲线越靠近图表顶部,气压越大。低于水平实线,气压越低。以声音形式听到的气压变化可能很快发生;对于音乐声来说,整个图表的持续时间可能不超过千分之一秒(1 毫秒)。
Figure 2.1 Time-domain representation of a signal. The vertical dimension shows the air pressure. When the curved line is near the top of the graph, the air pressure is greater. Below the solid horizontal line, the air pressure is reduced. Atmospheric pressure variations heard as sound can occur quickly; for musical sounds, this entire graph might last no more than one-thousandth of a second (1 ms).
声学乐器通过发出振动来改变乐器周围的气压,从而产生声音。扬声器则根据电子信号中的电压变化来回振动,从而产生声音。当扬声器从静止位置向内移动时,气压会降低。当扬声器向外移动时,扬声器附近的气压会升高。为了产生可听见的声音,这些向内/向外振动的频率必须介于 20 到 20,000 Hz 之间。
An acoustic instrument creates sound by emitting vibrations that change the air pressure around the instrument. A loudspeaker creates sound by moving back and forth according to voltage changes in an electronic signal. When the loudspeaker moves in from its position at rest, then the air pressure decreases. As the loudspeaker moves out, the air pressure near the loudspeaker is raised. To create an audible sound, these in/out vibrations must occur at a frequency in the range of about 20 to 20,000 Hz.
除了基频之外,波形中还可能存在许多频率。频域或频谱表示法可以显示声音的频率成分。频谱中的各个频率成分可以称为谐波或分音。谐波频率是基频的简单整数倍。假设基频或一次谐波为 100 Hz,其二次谐波为 200 Hz,三次谐波为 300 Hz,依此类推。更一般地说,任何频率成分都可以称为分音,无论它是否是基频的整数倍。事实上,许多声音并没有特定的基频。
Besides the fundamental frequency, there can be many frequencies present in a waveform. A frequency-domain or spectrum representation shows the frequency content of a sound. The individual frequency components of the spectrum can be referred to as harmonics or partials. Harmonic frequencies are simple integer multiples of the fundamental frequency. Assuming a fundamental or first harmonic of 100 Hz, its second harmonic is 200 Hz, its third harmonic is 300 Hz, and so on. More generally, any frequency component can be called a partial, whether or not it is an integer multiple of a fundamental. Indeed, many sounds have no particular fundamental frequency.
波形的频率成分可以用多种方式显示。一种标准方法是将每个部分绘制成沿x轴的一条线。每条线的高度表示每个频率成分的强度(或振幅)。最纯净的信号是正弦波形,之所以这样命名,是因为它可以被计算使用三角公式计算角度的正弦值。纯正弦波仅代表频谱中的一个频率分量或一条线。图 2.2描绘了几种波形的时域和频域表示。请注意,频谱图的横轴上标有谐波,因为分析算法假设其输入恰好是周期波形基波的一个周期。对于图 2.2g中的噪声信号,此假设不成立,因此我们将谐波重新标记为频率分量。
The frequency content of a waveform can be displayed in many ways. A standard way is to plot each partial as a line along an x-axis. The height of each line indicates the strength (or amplitude) of each frequency component. The purest signal is a sine waveform, so named because it can be calculated using trigonometric formulae for the sine of an angle. A pure sine wave represents just one frequency component or one line in a spectrum. Figure 2.2 depicts the time-domain and frequency-domain representations of several waveforms. Notice that the spectrum plots are labeled harmonics on their horizontal axis because the analysis algorithm assumes that its input is exactly one period of the fundamental of a periodic waveform. In the case of the noise signal in figure 2.2g, this assumption is not valid, so we relabel the partials as frequency components.
图 2.2 四个信号的时域和频域表示。(a)正弦波一个周期的时域视图。(b)正弦波中一个频率分量的频谱。(c)锯齿波一个周期的时域视图。(d)显示锯齿波频率内容呈指数下降的频谱。(e)复杂波形一个周期的时域视图。虽然波形看起来很复杂,但当它一遍又一遍重复时,它的声音实际上很简单——就像薄簧风琴的声音。(f)波形(e)的频谱显示它主要由几个频率组成。(g)随机噪声波形。(h)如果波形不断变化(每个周期都与上一个周期不同),那么我们听到的是噪声。噪声的频率内容非常复杂。在这种情况下,分析提取了 252 个频率。此快照并未显示它们的幅度如何随时间不断变化。
Figure 2.2 Time-domain and frequency-domain representations of four signals. (a) Time-domain view of one cycle of a sine wave. (b) Spectrum of the one frequency component in a sine wave. (c) Time-domain view of one cycle of a sawtooth waveform. (d) Spectrum showing the exponentially decreasing frequency content of a sawtooth wave. (e) Time-domain view of one cycle of a complex waveform. Although the waveform looks complex, when it is repeated over and over its sound is actually simple—like a thin reed organ sound. (f) The spectrum of waveform (e) shows that it is dominated by a few frequencies. (g) A random noise waveform. (h) If the waveform is constantly changing (each cycle is different from the last cycle), then we hear noise. The frequency content of noise is very complex. In this case the analysis extracted 252 frequencies. This snapshot does not reveal how their amplitudes are constantly changing over time.
正如我们在第 36 至 39 章中讨论的那样,绘制声音频谱的方法有很多种。
As we discuss in chapters 36–39, there are many ways to plot the spectrum of a sound.
周期波形在y轴(即幅度轴)上的起始点是其初始相位。例如,典型的正弦波起始于幅度点 0,并在 0 处完成其周期。如果我们将起始点在水平轴上移动π /2 弧度或 90°,则正弦波在幅度轴上的起始点和结束点均为 1。通常,这被称为余弦波。实际上,余弦波相当于相位偏移90° 的正弦波(图 2.3)。
The starting point of a periodic waveform on the y or amplitude axis is its initial phase. For example, a typical sine wave starts at the amplitude point 0 and completes its cycle at 0. If we displace the starting point by π/2 radians or 90° on the horizontal axis then the sinusoidal wave starts and ends at 1 on the amplitude axis. By convention this is called a cosine wave. In effect, a cosine is equivalent to a sine wave that is phase shifted by 90° (figure 2.3).
图 2.3 正弦波形相当于延迟的余弦波形,因此相移略有增加。
Figure 2.3 A sine waveform is equivalent to a cosine waveform that has been delayed and hence phase shifted slightly.
当两个信号始于同一点时,我们称它们同相或相位对齐。这与一个信号相对于另一个信号略有延迟形成对比,在前者中,两个信号的相位不同。当信号 A 的相位与信号 B 的相位完全相反时(即,相位相差 180°,即信号 A 中的每个正值在信号 B 中都有对应的负值),我们称 B 相对于 A极性相反。我们也可以说 B 是A 的反相副本。图 2.4描绘了两个反相信号相加时的效果:它们相互抵消。
When two signals start at the same point, they are said to be in phase or phase aligned. This contrasts to a signal that is slightly delayed with respect to another signal, in which the two signals are out of phase. When a signal A is the exact opposite phase of another signal B (i.e., it is 180° out of phase, so that for every positive value in signal A there is a corresponding negative value for signal B), we say that B has reversed polarity with respect to A. We could also say that B is a phase-inverted copy of A. Figure 2.4 portrays the effect when two signals in inverse phase relationship sum: they cancel out each other.
图 2.4 相位反转的效果。(b) 是 (a) 的相位反转版本。如果将两个波形相加,(c) 则其和为零。
Figure 2.4 The effects of phase inversion. (b) is a phase-inverted copy of (a). If the two waveforms are added together, (c) they sum to zero.
有时人们认为相位对人耳来说无关紧要,因为两个除初始相位外完全相同的信号很难区分。然而,研究表明,在实验室条件下,一些人可以区分绝对相位或极性相差 180° 的信号(Greiner and Melton 1991)。有关相位感知的更多信息,请参阅 Laitinen、Disch 和 Pulkki (2013)。
It is sometimes said that phase is insignificant to the human ear, because two signals that are exactly the same except for their initial phase are difficult to distinguish. However, research indicates that 180° differences in absolute phase or polarity can be distinguished by some people under laboratory conditions (Greiner and Melton 1991). For more on phase perception, refer to Laitinen, Disch, and Pulkki (2013).
即使排除特殊情况,相位也是一个重要的概念,原因如下。每个滤波器都利用相移来改变信号。滤波器会改变相位通过将输入信号短暂延迟,然后将相移版本与原始信号相结合,从而产生与频率相关的相位抵消,从而衰减某些频率,并增强某些频率。频率相关是指并非所有频率成分都会受到同等影响。当相移随时间变化时,受影响的频段也会发生变化,从而产生称为“移相”或“镶边”的扫频音效(参见第30章)。
Even apart from special cases, phase is an important concept for several reasons. Every filter uses phase shifts to alter signals. A filter shifts the phase of a signal by delaying its input for a short time and then combines the phase-shift version with the original signal. This creates frequency-dependent phase cancelation to attenuate certain frequencies and phase reinforcement to boost certain frequencies. By frequency-dependent we mean that not all frequency components are affected equally. When the phase shift is time varying, the affected frequency bands also vary, creating the sweeping sound effect called phasing or flanging (see chapter 30).
对于基于现有声音分析重新合成声音的系统来说,相位也至关重要。具体来说,这些系统需要知道每个频率成分的起始相位,以便按正确的顺序组合不同的成分(M.-H. Serra 1997;X. Serra 1997)。相位数据对于再现短促、快速变化的瞬态声音(例如打击乐音调的起始)尤为重要。
Phase is also important in systems that resynthesize sound on the basis of an analysis of an existing sound. In particular, these systems need to know the starting phase of each frequency component in order to assemble the different components in the right order (M.-H. Serra 1997; X. Serra 1997). Phase data is particularly critical in reproducing short, rapidly changing transient sounds, such as the onset of a percussive tone.
最后,人们非常重视音频组件,它们会尽可能减少输入信号的相位偏移,因为频率相关的相移会使音乐信号产生可听见的失真,并干扰扬声器的成像。(成像是指一组扬声器产生稳定的音频图像的能力,其中每个音频源都位于图像中的特定位置。)不必要的相移称为相位失真。打个形象的比喻,相位失真的信号就是失焦的。
Finally, much attention has been invested in audio components that shift the phases of their input signals as little as possible, because frequency-dependent phase shifts distort musical signals audibly and interfere with loudspeaker imaging. (Imaging is the ability of a set of loudspeakers to create a stable audio picture in which each audio source is localized to a specific place within the picture.) Unwanted phase shifting is called phase distortion. To make a visual analogy, a phase-distorted signal is out of focus.
我们对声级或声强都有直观的概念。即使是小孩子也能理解音量旋钮的作用。分贝是量级关系的测量单位,包括电压电平、强度或功率,尤其适用于音频系统。在声学测量中,分贝标度表示一个声级与参考声级的比率,其依据是
We all have an intuitive notion of sound level or sound magnitude. Even a small child understands the function of a volume knob. The decibel is a unit of measurement for relationships of magnitude, including voltage levels, intensity, or power, particularly in audio systems. In acoustic measurements, the decibel scale indicates the ratio of one level to a reference level, according to the relation
分贝数 = 10 × log 10 (声级 / 参考声级)
number of decibels = 10 × log10 (level/reference level)
参考声级通常为听觉阈值(10 −12瓦/平方米)。分贝的对数基数意味着,如果两个音符同时发出,每个音符的强度为 60 分贝,声级仅增加 3 分贝。强度增加百万倍,也只能增加 60 分贝。
where the reference level is usually the threshold of hearing (10−12 watts per square meter). The logarithmic basis of decibels means that if two notes sound together and each note is 60 dB, the increase in level is just 3 dB. A millionfold increase in intensity results in only a 60 dB boost.
图 2.5显示了分贝刻度和相对于 0 dB 的一些估计声功率级。
Figure 2.5 shows the decibel scale and some estimated acoustic power levels relative to 0 dB.
图 2.5 各种声源的典型声功率级。所有值均以 0 dB 为基准。
Figure 2.5 Typical acoustic power levels for various acoustic sources. All values are relative to 0 dB.
第 4 章对声音幅度和分贝进行了更深入的讨论。
Chapter 4 contains a more thorough discussion of sound magnitude and decibels.
动态范围是指系统在不失真的情况下能够处理的最大音量与最小音量之间的比率。数字音频系统的动态范围要求包含两个重要因素:
Dynamic range is the ratio between the loudest and the softest sound that can be handled by a system without distortion. Two important facts describe the dynamic range requirements of a digital audio system:
在录制音乐时,我们希望重现音乐的全部表现力。因此,捕捉尽可能宽广的动态范围至关重要。例如,在现场管弦乐音乐会中,动态范围可以从寂静无声到超过 110 dB 的齐奏(全场管弦乐)部分。
In recording music, we want to reproduce the full expressive power of the music. Thus it is important to capture the widest possible dynamic range. In a live orchestra concert, for example, the dynamic range can vary from silence to a tutti (full orchestra) section exceeding 110 dB.
每种录音设备(例如麦克风、麦克风前置放大器、混音器或录音机)在失真之前只能处理特定的动态范围。例如,模拟磁带设备的动态范围由模拟录音过程的物理特性决定。对于未配备降噪设备的专业开盘式磁带录音机,1 kHz 音调的动态范围约为 80 dB。相比之下,高质量的数字录音机的动态范围可以达到 120 dB 左右。
Every recording device (such as a microphone, mic preamplifier, mixer, or recorder) can handle only a certain dynamic range before it distorts. For example, the dynamic range of analog tape equipment is dictated by the physics of the analog recording process. That range is around 80 dB for a 1 kHz tone using professional reel-to-reel tape recorders without noise-reduction devices. By contrast, a high-quality digital audio recorder can have a dynamic range of around 120 dB.
第 4 章更详细地介绍了数字音频系统中的动态范围。
Chapter 4 goes into more detail about dynamic range in digital audio systems.
柯蒂斯路与约翰·M·斯特劳恩
Curtis Roads with John M. Strawn
Analog Representations of Sound
Digital Representations for Sound
Digital Audio Samples Are Not MIDI Data
Reconstruction of the Analog Signal
Antialiasing and Anti-imaging Filters
本章首先解释模拟音频系统和数字音频系统的区别。然后,我们将逐步讲解数字音频录制和播放链的基础知识。如需进一步的技术学习,请参阅 Pohlmann (2010)。
This chapter begins by explaining differences between analog and digital audio systems. We then step through the basics of the digital audio recording and playback chain. For further technical study, consult Pohlmann (2010).
正如气压会随着声波而变化一样,连接放大器和扬声器的电线中被称为电压的电特性也会随之变化。我们无需在此定义电压。就本章而言,我们仅假设可以以与气压变化紧密匹配的方式修改电特性。我们可以说气压和电压是类似的。也就是说,麦克风拾取的气压变化图与播放该声音时扬声器位置变化的图相似。术语“模拟”意味着这些特性可以以类似的方式变化。
Just as air pressure varies according to sound waves, so can the electrical property called voltage in a wire connecting an amplifier with a loudspeaker. We do not need to define voltage here. For the purposes of this chapter, we simply assume that it is possible to modify an electrical property in a fashion that closely matches changes in air pressure. We can say that air pressure and voltage are analogous to each other. That is, a graph of the air pressure variations picked up by a microphone looks similar to a graph of the variations in the loudspeaker position when that sound is played back. The term analog means that these properties can vary in a similar manner.
图 3.1展示了一个模拟音频播放链。音频波形的曲线可以刻在传统唱片的凹槽上。唱片凹槽的壁面记录着唱片中存储的声音的连续时间表示。当唱针在凹槽中滑动时,它会以横向运动来回移动。这种横向运动随后转化为电压,经放大后最终到达扬声器。
Figure 3.1 shows an analog audio playback chain. The curve of an audio waveform can be inscribed along the groove of a traditional phonograph record. The walls of the grooves on a phonograph record contain a continuous-time representation of the sound stored in the record. As the needle glides through the groove, it moves back and forth in lateral motion. This lateral motion is then changed into voltage, which is amplified and eventually reaches the loudspeaker.
图 3.1 模拟音频播放链,从唱片凹槽转换的模拟波形开始,到发送到前置放大器、放大器和扬声器并投射到空中的电压。
Figure 3.1 The analog audio playback chain, starting from an analog waveform transduced from the grooves of a phonograph record to a voltage sent to a preamplifier, amplifier, and loudspeaker and projected into the air.
模拟录音和声音再现技术已发展到相当高的水平,但它面临着基本的物理限制。具体来说,当将模拟录音复制到另一台模拟录音机上时,复制品的质量永远不如原版。原因是模拟录音过程总会添加噪音。对于用高质量磁带录音机录制的第一代或原始录音,这种噪音并不令人反感。但如果我们将第一代磁带复制到另一盘磁带上,然后再复制复制品,噪音就会明显增加。相比之下,数字技术可以创建任意数量的完美(无噪音)原始数字录音的克隆版本,正如我们稍后将要展示的那样。
Analog recording and reproduction of sound has been taken to a high level, but it faces fundamental physical limits. Specifically, when one copies an analog recording onto another analog recorder, the copy is never as good as the original. The reason is that the analog recording process always adds noise. On a first-generation or original recording made with a high-quality tape recorder, this noise is not objectionable. But if we copy the first-generation tape onto another tape and then copy the copy, the noise increases noticeably. In contrast, digital technology can create any number of generations of perfect (noise-free) clones of an original digital recording, as we show further on.
本节介绍与数字信号相关的最基本概念,包括音频信号转换为二进制数,以及音频数据与 MIDI 数据的比较。
This section introduces the most basic concepts associated with digital signals, including the conversion of audio signals into binary numbers, and comparison of audio data with MIDI data.
让我们看一下数字录音和回放的过程。数字录音机处理的是离散时间信号,而不是模拟世界中的连续时间信号。图3.2显示了数字音频录音和回放过程的示意图。在该图中,麦克风将气压变化转换成电压变化,然后电压变化经过抗混叠滤波器,再进入模数转换器( ADC)。(我们将在后续章节讨论抗混叠滤波器的功能。)ADC在每个均匀采样周期对电压变化进行采样,并将其转换为一串二进制数。时钟。然后,二进制数被存储在数字记录介质(一种存储器)中。
Let us look at the process of digitally recording sound and then playing it back. Rather than the continuous-time signals of the analog world, a digital recorder handles discrete-time signals. Figure 3.2 shows a diagram of the digital audio recording and playback process. In this diagram, a microphone transduces air pressure variations into electrical voltage variations, which are then passed through an antialiasing filter and then to an analog-to-digital converter (ADC). (We discuss the function of the antialiasing filter in a subsequent section.) The ADC samples and converts the voltage variations into a string of binary numbers at each uniform period of the sample clock. The binary numbers are then stored in a digital recording medium—a type of memory.
图 3.2 数字录音和回放概览。
Figure 3.2 Overview of digital recording and playback.
与十进制(或十进制)数字(使用 0-9 十个数字)不同,二进制(或二进制)数字仅使用两个数字:0 和 1。术语“位”是“二进制数字”的缩写。表 3.1列出了二进制数及其十进制等效值。在许多数字系统中,最左边的位被解释为符号指示符,其中 1 表示正整数,0 表示负整数。(像 8.476 这样的实数十进制数也可以用二进制浮点数表示,但我们在此不做解释。)
In contrast to decimal (or base ten) numbers, which use the ten digits 0–9, binary (or base two) numbers use only two digits: 0 and 1. The term bit is an abbreviation of binary digit. Table 3.1 lists binary numbers and their decimal equivalents. In many digital systems the leftmost bit is interpreted as a sign indicator, with a 1 indicating a positive integer and a 0 indicating a negative integer. (Real decimal numbers such as 8.476 can also be represented in binary as floating-point numbers, but we will not explain this scheme here.)
|
表 3.1 二进制数及其十进制等价物 Table 3.1 Binary numbers and their decimal equivalents |
||
|---|---|---|
|
二进制 Binary |
十进制 Decimal |
|
|
0 0 |
0 0 |
|
|
1 1 |
1 1 |
|
|
10 10 |
2 2 |
|
|
11 11 |
3 3 |
|
|
100 100 |
4 4 |
|
|
1000 1000 |
8 8 |
|
|
10000 10000 |
16 16 |
|
|
100000 100000 |
三十二 32 |
|
|
1111111111111111 1111111111111111 |
65,535 65,535 |
|
位在记录介质中的物理编码方式取决于该介质的属性。例如,在磁盘上,1 可能由正磁荷表示,而 0 则表示无磁荷。这与模拟磁带记录不同,在模拟磁带记录中,信号表示为连续变化的磁荷。在光盘等光学介质上,二进制数据可能被编码为特定位置反射率的变化。固态存储器或闪存将位编码为由晶体管构成的存储单元中的电荷。
The way a bit is physically encoded in a recording medium depends on the properties of that medium. On a magnetic disc, for example, a 1 might be represented by a positive magnetic charge, whereas a 0 is indicated by the absence of such a charge. This is different from an analog magnetic tape recording, in which the signal is represented as a continuously varying magnetic charge. On an optical medium such as a compact disc, binary data might be encoded as variations in the reflectance at a particular location. Solid-state or flash memory encodes a bit as an electrical charge in a memory cell made out of transistors.
图 3.3描绘了将音频信号 (a) 转换为数字信号 (b) 的结果。当听众想要再次听到声音时,二进制数字会被逐一从数字存储器中读取,并通过数模转换器( DAC) 进行传输。该设备由采样时钟驱动,将数字流转换为一系列电压电平。之后,该过程与图 3.2所示的相同;也就是说,一系列电压电平经过低通滤波变成连续时间波形(图 3.3c),然后放大,并传输到扬声器,扬声器的振动会引起气压的变化。瞧——信号再次响起。
Figure 3.3 depicts the result of converting an audio signal (a) into a digital signal (b). When the listener wants to hear the sound again, the binary numbers are read one by one from the digital storage and passed through a digital-to-analog converter (DAC). This device, driven by a sample clock, changes the stream of numbers into a series of voltage levels. From there, the process is the same as shown in figure 3.2; that is, the series of voltage levels are lowpass filtered into a continuous-time waveform (figure 3.3c), amplified, and routed to a loudspeaker, whose vibration causes the air pressure to change. Voilà—the signal sounds again.
图 3.3 信号的模拟和数字表示。(a) 模拟正弦波形。波形下方的水平线表示一个周期或循环。(b) 图 (a) 中正弦波形的采样版本,它可能出现在 ADC 的输出端。每个垂直线代表一个样本。每个样本都以一个数字的形式存储在内存中,该数字代表垂直线的高度。(c) 图 (b) 中波形采样版本的重构。实际上,这些样本通过低通平滑滤波器连接起来,形成最终到达听者耳朵的波形。
Figure 3.3 Analog and digital representations of a signal. (a) Analog sine waveform. The horizontal bar below the wave indicates one period or cycle. (b) Sampled version of the sine waveform in (a), as it might appear at the output of an ADC. Each vertical bar represents one sample. Each sample is stored in memory as a number that represents the height of the vertical bar. (c) Reconstruction of the sampled version of the waveform in (b). In effect, the samples are connected by the lowpass smoothing filter to form the waveform that eventually reaches the listener’s ear.
总而言之,我们可以将空气中的声音转换成一串二进制数字,并以数字方式存储。这个转换过程的核心组件是ADC。当我们想再次听到声音时,DAC可以将这些数字转换回声音。
In summary, we can change a sound in the air into a string of binary numbers that can be stored digitally. The central component in this conversion process is the ADC. When we want to hear the sound again, a DAC can change those numbers back into sound.
本节或许能帮您理清思路。ADC 生成的数字串与乐器数字接口 (MIDI) 数据无关。(有关 MIDI 的更多信息,请参阅第 52 章。)数字音频录音对声音波形进行采样,而 MIDI 录音则捕捉在控制器(例如键盘)上演奏的数据。MIDI 音符信息仅包含音符起始和结束时间、音高(一个数字)以及音符起始处的振幅(一个数字)。如果 MIDI 音符被传回这使得合成器能够像最初播放该声音的合成器一样播放声音,类似于钢琴卷帘录音。如果音乐家在 MIDI 合成器上以每分钟 60 拍的速度演奏四个四分音符,那么只需 16 条信息就能捕捉到这四秒的声音(四个起始、结束、音高和振幅)。
This section may clear up confusion. The string of numbers generated by the ADC is not related to Musical Instrument Digital Interface (MIDI) data. (For more information on MIDI, see chapter 52.) Digital audio recording samples the sound waveform whereas MIDI recording captures the data played on a controller, such as a keyboard. MIDI note information includes only the start and ending time, pitch (a number), and amplitude at the beginning of the note (a number). If a MIDI note is transmitted back to the synthesizer on which it was originally played, this causes the synthesizer to play the sound as it did originally, similarly to a piano roll recording. If the musician plays four quarter notes at a tempo of sixty beats per minute on a MIDI synthesizer, just sixteen pieces of information capture this four-second sound (four starts, ends, pitches, and amplitudes).
相比之下,如果我们将麦克风连接到采样频率设为 44.1 kHz 的数字录音机上录制相同的声音,则会为相同的声音(44,100 × 2 通道× 4 秒)录制 352,800 条信息(以音频样本的形式)。数字录音的存储需求非常大。使用 16 位样本,存储四秒的声音需要超过 700,000 字节。这比 MIDI 存储的数据量大 44,100 倍。
By contrast, if we record the same sound with a microphone connected to a digital audio recorder set to a sampling frequency of 44.1 kHz, 352,800 pieces of information (in the form of audio samples) are recorded for the same sound (44,100 × 2 channels × 4 seconds). The storage requirements of digital audio recording are large. Using 16-bit samples, it takes over 700,000 bytes to store a four-second sound. This is 44,100 times more data than is stored by MIDI.
数字音频录音的优势在于,它可以捕捉麦克风能够录制的任何声音,包括人声。相比之下,MIDI 序列录音仅限于录制指示一系列音符事件的开始、结束、音高和振幅的控制信号。
The advantage of a digital audio recording is that it can capture any sound that can be recorded by a microphone, including the human voice. By contrast, MIDI sequence recording is limited to recording control signals that indicate the start, end, pitch, and amplitude of a series of note events.
图 3.3b所示的数字信号与图 3.3a所示的原始模拟信号有显著不同。首先,数字信号仅在特定时间点定义。这是因为信号是在特定时间点采样的。图 3.3b中的每个竖线代表原始信号的一个样本。样本以二进制数存储;图 3.3b中竖线越高,数字越大。
The digital signal shown in figure 3.3b is significantly different from the original analog signal shown in figure 3.3a. First, the digital signal is defined only at certain points in time. This happens because the signal has been sampled at certain times. Each vertical bar in figure 3.3b represents one sample of the original signal. The samples are stored as binary numbers; the higher the bar in figure 3.3b, the larger the number.
用于表示每个样本的位数称为系统的量化。量化决定了系统能够处理的噪声水平和幅度范围。例如,一张光盘的量化位数为16位。也就是说,每个样本都用一个16位的数字表示。量化位数越高,幅度分辨率就越高,从而带来更低的噪声和更大的动态范围。量化位数越少,则效果越差。我们将在第4章更详细地讨论量化时再次讨论这个主题。
The number of bits used to represent each sample is called the quantization of the system. Quantization determines both the noise level and the amplitude range that can be handled by the system. For example, a compact disc has a quantization of 16 bits. That is, every sample is represented by a 16-bit number. More bits of quantization mean better amplitude resolution. This translates into lower noise and more dynamic range. Fewer bits result in the opposite. We return to this subject in chapter 4 when we discuss quantization at greater length.
采样率(采样频率)以每秒采样数表示。这是数字音频系统的一项重要规格。它通常被称为采样率,以赫兹 (Hz) 表示。将 1,000 Hz 简化为 1 kHz,我们说:“CD 录音的采样率为 44.1 kHz”,其中 k源自公制术语“千”,表示千。
The rate at which samples are taken—the sampling frequency—is expressed in terms of samples per second. This is an important specification of digital audio systems. It is often called the sampling rate and is expressed in terms of hertz (Hz). Simplifying the measurement 1,000 Hz to 1 kHz, we say, “The sampling rate of a compact disc recording is 44.1 kHz,” where the k is derived from the metric term kilo, which means thousand.
在数字音频系统中,50 kHz 左右的采样频率很常见,尽管更低和更高的频率也存在。无论如何,每秒 50,000 个数字是一个快速的数字流;这意味着一分钟的立体声有 6,000,000 个样本。
Sampling frequencies around 50 kHz are common in digital audio systems, although both lower and higher frequencies can also be found. In any case, 50,000 numbers per second is a rapid stream of numbers; it means there are 6,000,000 samples for one minute of stereo sound.
图 3.3b中的数字信号没有显示条形之间的值。条形的持续时间非常短,可能只有 0.00002 秒(十万分之二秒)。这意味着,如果原始信号在条形之间发生变化,则直到下一个样本被采集之前,该变化都不会反映在条形的高度上。用专业术语来说,我们说图 3.3b中的信号是在离散时间定义的,每个这样的时间都由一个样本(垂直条)表示。
The digital signal in figure 3.3b does not show the value between the bars. The duration of a bar is extremely narrow, perhaps lasting only 0.00002 s (two hundred-thousandths of a second). This means that if the original signal changes between the bars, the change is not reflected in the height of a bar until the next sample is taken. In technical terms, we say that the signal in figure 3.3b is defined at discrete times, each such time represented by one sample (vertical bar).
数字音频的神奇之处在于,如果信号带宽受限, DAC 和相关硬件就能根据这些样本精确地重建原始信号。如果信号的频率范围有限,我们称其为带宽受限信号。这意味着,在特定条件下,样本之间缺失的信号部分可以被恢复。当数字信号通过包含低通平滑滤波器的 DAC 时,就会发生这种情况。低通平滑滤波器的设计旨在精确重建离散样本之间缺失的信号部分(参见图 3.3c中的虚线)。因此,发送到扬声器的信号看起来和听起来都像原始信号。
Part of the magic of digital audio is that if the signal is bandlimited, the DAC and associated hardware can exactly reconstruct the original signal from these samples. We call a signal bandlimited if it has frequencies only within a finite range. This means that, given certain conditions, the missing part of the signal between the samples can be restored. This happens when the numbers are passed through the DAC, which includes the lowpass smoothing filter. The lowpass smoothing filter is designed precisely to re-create the missing part of the signal between the discrete samples (see the dotted line in figure 3.3c). Thus, the signal sent to the loudspeaker looks and sounds like the original signal.
采样过程并不像看起来那么简单。正如音频放大器会引入失真一样,采样也会对声音产生影响。图 3.4给出了一个例子。使用图 3.4a所示的输入波形,假设在图 3.4b中竖线所示的每个时间点(每个竖线创建一个样本)对该波形进行采样。与之前一样,图 3.4c中得到的样本以数字形式存储在数字存储器中。然而,当我们尝试重建原始波形时,如图 3.4d所示,结果却完全不同。
The process of sampling is not quite as straightforward as it might seem. Just as an audio amplifier can introduce distortion, sampling can play tricks with sound. Figure 3.4 gives an example. Using the input waveform shown in figure 3.4a, suppose that a sample of this waveform is taken at each point in time shown by the vertical bars in figure 3.4b (each vertical bar creates one sample). As before, the resulting samples of figure 3.4c are stored as numbers in digital memory. However, when we attempt to reconstruct the original waveform, as shown in figure 3.4d, the result is something completely different.
图 3.4 采样问题。(a)待记录的波形。(b)采样脉冲;每当发生一个采样脉冲时,就会进行一次采样。(c)采样并存储在内存中的波形。(d)当(c)中的波形发送到 DAC 时,输出可能如下所示(Mathews 1969 年之后)。
Figure 3.4 Problems in sampling. (a) Waveform to be recorded. (b) The sampling pulses; whenever a sampling pulse occurs, one sample is taken. (c) The waveform as sampled and stored in memory. (d) When the waveform from (c) is sent to the DAC, the output might appear as shown here (after Mathews 1969).
为了更好地理解采样过程中可能出现的问题,我们来看一下,在不改变采样间隔时间长度的情况下,改变原始信号的波长(一个周期的长度)会发生什么。图 3.5a显示了周期为 8 个采样的信号,图 3.5d显示了周期为 2 个采样的信号,图 3.5g显示了每 10 个采样包含 11 个周期的波形。
In order to understand better the problems that can occur with sampling, we look at what happens when we change the wavelength (the length of one cycle) of the original signal without changing the length of time between samples. Figure 3.5a shows a signal with a cycle eight samples long, figure 3.5d shows a cycle two samples long, and figure 3.5g shows a waveform with eleven cycles per ten samples.
图 3.5 混叠效应。在每组三个图表的底部,粗黑点代表样本,虚线表示 DAC 重建的信号。在 (b) 中,正弦波形 (a) 的每个周期被采样八次。使用相同的采样频率,在 (e) 中,(d) 的每个周期仅被采样两次。如果 (e) 中的采样脉冲向右移动,则 (f) 中的输出波形可能会发生相移,尽管输出频率仍然相同。在 (h) 中,(g) 中的十一个周期有十个样本。当 DAC 尝试重建信号时,如 (i) 中的虚线所示,会得到正弦波形,但由于折叠效应,频率已完全改变。请注意 (g) 上方的水平双箭头,它表示输入波形的一个周期,以及 (i) 上方的箭头,它表示输出波形的一个周期。
Figure 3.5 Aliasing effects. At the bottom of each set of three graphs, the thick black dots represent samples, and the dotted line shows the signal as reconstructed by the DAC. Every cycle of the sine waveform (a) is sampled eight times in (b). Using the same sampling frequency, each cycle of (d) is sampled only twice in (e). If the sampling pulses in (e) were moved to the right, the output waveform in (f) might be phase-shifted, although the frequency of the output would still be the same. In (h), there are ten samples for the eleven cycles in (g). When the DAC tries to reconstruct a signal, as shown by the dashed lines in (i), a sine waveform results, but the frequency has been completely changed due to the foldover effect. Notice the horizontal double arrow above (g), indicating one cycle of the input waveform, and the arrow above (i), indicating one cycle of the output waveform.
同样,当每组样本通过 DAC 和相关硬件时,会重建一个信号(图 3.5c、f和i )并发送到扬声器。图 3.5c中虚线所示的信号重建得或多或少准确。图 3.5f中的采样结果可能不太令人满意;那里显示了一种可能的重建。但在图 3.5i中,重新合成的波形在一个重要方面与原始波形完全不同。即,重新合成波形的波长(周期长度)与原始波形不同。在现实世界中,这意味着重建信号的音调与原始信号不同。这种失真称为混叠。
Again, as each of the sets of samples is passed through the DAC and associated hardware, a signal is reconstructed (figures 3.5c, f, and i) and sent to the loudspeaker. The signal shown by the dotted line in figure 3.5c is reconstructed more or less accurately. The results of the sampling in figure 3.5f are potentially less satisfactory; one possible reconstruction is shown there. But in figure 3.5i, the resynthesized waveform is completely different from the original in one important respect. Namely, the wavelength (length of the cycle) of the resynthesized waveform is different from that of the original. In the real world, this means that the reconstructed signal sounds at a pitch different from that of the original signal. This kind of distortion is called aliasing.
发生混叠的频率是可以预测的。为了简化计算,假设采样率为 1,000 Hz。那么图 3.5a中的信号频率为 125 Hz(因为每个周期有 8 个样本,1,000/8 = 125)。图 3.5d中的信号频率为 500 Hz(因为 1,000/2 = 500)。图 3.5g中的输入信号频率为 1,100 Hz。
The frequencies at which this aliasing occurs can be predicted. Suppose, just to keep the numbers simple, that the sampling rate is 1,000 Hz. Then the signal in figure 3.5a has a frequency of 125 Hz (because there are eight samples per cycle, and 1,000/8 = 125). In figure 3.5d, the signal has a frequency of 500 Hz (because 1,000/2 = 500). The frequency of the input signal in figure 3.5g is 1,100 Hz.
注意输出信号的频率是如何不同的。在图 3.5i中,你可以计算出输出波形每个周期的 10 个样本。实际上,输出波形的频率为 1,000/10 = 100 Hz。因此图 3.5g中原始信号的频率在采样率转换过程中发生了改变。这对于音乐信号来说是一种不可接受的改变,必须尽可能避免。
Notice how the frequency of the output signal is different. In figure 3.5i you can count ten samples per cycle of the output waveform. In actuality, the output waveform occurs at a frequency of 1,000/10 = 100 Hz. Thus the frequency of the original signal in figure 3.5g has been changed by the sample rate conversion process. This represents an unacceptable change to a musical signal that must be avoided if possible.
从图 3.5可以推导出,只要原始波形每个周期至少有两个样本,我们就可以假设重新合成的波形具有相同的频率。但是,当每个周期的样本少于两个时,原始信号的频率就会丢失。如果原始频率高于采样频率的一半,则
We can generalize from figure 3.5 to say that as long as there are at least two samples per period of the original waveform, we can assume that the resynthesized waveform will have the same frequency. But when there are fewer than two samples per period, the frequency of the original signal is lost. If the original frequency is higher than half the sampling frequency, then
新频率 = 采样频率 − 原始频率
new frequency = sampling frequency − original frequency
这个公式在数学上并不完整,但足以满足我们在此的讨论。它的含义如下:假设我们选择了一个固定的采样频率。我们从一个低频信号开始,对其进行采样,并在采样后重新合成信号。随着我们提高输入信号的音高(但仍保持采样频率不变),重新合成信号的音高与输入信号的音高相同,直到达到对应于采样频率一半的音高。随着我们进一步提高输入信号的音高,输出信号的音高开始下降到最低频率!
This formula is not mathematically complete, but it is sufficient for our discussion here. It means the following. Suppose that we have chosen a fixed sampling frequency. We start with a signal at a low frequency, sample it, and resynthesize the signal after sampling. As we raise the pitch of the input signal (but still keep the sampling frequency constant), the pitch of the resynthesized signal is the same as the pitch of the input signal until we reach a pitch that corresponds to one-half the sampling frequency. As we raise the pitch of the input signal even higher, the pitch of the output signal starts to descend to the lowest frequencies!
该过程如图 3.6所示,它说明了为什么混叠有时被称为折叠。
The process is depicted in figure 3.6, which shows why aliasing was sometimes called foldover.
图 3.6 当输入频率超过奈奎斯特频率时,记录的信号会折叠并向下进行。
Figure 3.6 When the input frequency exceeds the Nyquist frequency, the recorded signal folds over and proceeds downward.
举一个具体的例子,假设我们将 30 kHz 的模拟频率引入到以 48 kHz 采样率工作的 ADC 中。当由 DAC 重建时,它将产生 18 kHz 的音调,因为 48 − 30 = 18。
To give a concrete example, suppose that we introduce an analog frequency at 30 kHz into an ADC operating at a 48 kHz sampling rate. When reconstructed by a DAC, it will produce a tone at 18 kHz, because 48 − 30 = 18.
采样定理(或奈奎斯特定理)描述了采样率与传输信号带宽之间的关系。该定理由哈罗德·奈奎斯特(Harold Nyquist,1928)表述如下:
The sampling theorem (or Nyquist theorem) describes the relationship between the sampling rate and the bandwidth of the signal being transmitted. It was expressed by Harold Nyquist (1928) as follows:
对于接收信号的任何给定变形,传输频率范围必须与信号传输速度成正比地增加。……结论是频带与速度成正比。
For any given deformation of the received signal, the transmitted frequency range must be increased in direct proportion to the signaling speed.… The conclusion is that the frequency band is directly proportional to the speed.
采样定理的要点可以精确地表述如下:
The essential point of the sampling theorem can be stated precisely as follows:
为了能够从样本中重建连续信号,我们采样的频率必须至少是信号中最高频率的两倍。
In order to be able to reconstruct a continuous signal from its samples, the frequency at which we sample must be at least twice the highest frequency in the signal.
数字音频系统中能够再现的最高频率(即采样率的一半)称为奈奎斯特频率。在数字音乐系统中,奈奎斯特频率通常高于人耳听觉的上限,即 20 kHz 以上。因此,采样频率可以指定为至少两倍,即 40 kHz 以上。例如,在以 44.1 kHz(CD 的采样频率)采样的录音系统中,奈奎斯特频率为 22.05 kHz。
The highest frequency that can be reproduced in a digital audio system (i.e., half the sampling rate) is called the Nyquist frequency. In digital musical systems, the Nyquist frequency is usually above the upper range of human hearing, that is, above 20 kHz. Then the sampling frequency can be specified as being at least twice as much, or above 40 kHz. For example, in a recording system that samples at 44.1 kHz (the sampling frequency of compact discs), the Nyquist frequency is 22.05 kHz.
为了确保数字音响系统正常工作,需要两个重要的滤波器。回想一下图 3.2。其中一个滤波器位于 ADC 之前,以确保输入信号中不会出现高于采样频率一半的频率。只要这个滤波器正常工作,录音过程中就不会出现混叠。从逻辑上讲,这种滤波器被称为抗混叠滤波器。
In order to make sure that a digital sound system works properly, two important filters are included. Recall figure 3.2. One filter is placed before the ADC, to make sure that nothing in the input signal occurs at a frequency higher than half the sampling frequency. As long as this filter does the proper work, aliasing should not occur during the recording process. Logically enough, such a filter is called an antialiasing filter.
另一个滤波器位于DAC之后。其主要功能是将数字存储的样本转换为平滑、连续的信号表示。实际上,这个低通抗镜像或平滑滤波器通过连接图中的实心黑点,创建了图3.3c中的虚线。
The other filter is placed after the DAC. Its main function is to change the samples stored digitally into a smooth, continuous representation of the signal. In effect, this lowpass anti-imaging or smoothing filter creates the dotted line in figure 3.3c by connecting the solid black dots in the figure.
计算机和移动设备内置的 ADC 和 DAC 制造成本低廉,性能足以满足日常使用。然而,计算机、平板电脑或手机的标准音频输入和输出不足以实现高质量的音频录制和播放。
The ADCs and DACs built into computers and mobile devices are inexpensive to manufacture. They perform adequately for daily use. However, the standard audio input and output of a computer, tablet, or phone is inadequate for high-quality audio recording and playback.
音频接口提供高质量的音频解决方案(图 3.7)。它使用设备支持的协议(例如 USB、Thunderbolt 或以太网)连接到计算机或移动设备。它包含高质量的 DAC 和 ADC 以及麦克风/线路前置放大器。许多音频接口都配有传统的 MIDI 输入/输出连接器。
An audio interface provides a high-quality audio solution (figure 3.7). It connects to the computer or mobile device using a protocol that the device supports (such as USB, Thunderbolt, or Ethernet). It includes high-quality DACs and ADCs and microphone/line preamplifiers. Many have traditional MIDI input/output connectors.
图 3.7 RME Fireface UFX +音频接口的前后面板。前面板配有四个麦克风/线路前置放大器和 MIDI 接口。支持 12 个模拟输入和输出。后面板布满了各种接口,包括光纤 MADI 多通道接口。该接口总共可处理 96 个输入和输出,并通过 USB 3 或 Thunderbolt 连接到计算机。
Figure 3.7 Front and back panel of the RME Fireface UFX+ audio interface. The front panel shows four mic/line preamplifiers and MIDI jacks. Twelve analog inputs and outputs are supported. The rear panel bristles with connectors, including optical MADI multichannel jacks. The interface processes a total of ninety-six inputs and outputs and connects to a computer via USB 3 or Thunderbolt.
音频接口的成本范围从用于家庭录音的廉价设备(< 100 美元)到支持高采样率多通道的专业设备(> 10,000 美元)。
The cost of audio interfaces ranges from inexpensive units (< $100) designed for home recording to professional units (> $10,000) that support multiple channels at high sampling rates.
音频工程协会 (Audio Engineering Society) 推荐了一组音频标准采样率,包括 32 kHz、44.1 kHz、48 kHz 和 96 kHz (Audio Engineering Society 2008)。协会还承认,这些基本采样率的倍数仍在使用中,例如 88.2 kHz、176.4 kHz、192 kHz、352.8 kHz、384 kHz,甚至 768 kHz。
The Audio Engineering Society has recommended a set of standard sampling rates for audio, including 32, 44.1, 48, and 96 kHz (Audio Engineering Society 2008). It also recognizes that multiples of these basic rates are in use, such as 88.2. 176.4, 192, 352.8, 384, and even 768 kHz.
对于高品质音乐的录制和重现来说,理想的采样频率是多少?专家们对此意见不一。
What sampling frequency is ideal for high-quality music recording and reproduction? Experts disagree.
我们都想要高分辨率的摄像机。难道我们不应该也想要高分辨率的录音吗?更高的采样率可以增加录音带宽,这意味着更好的音质。然而,它们也会产生文件大小会大得多。文件中的样本越多,处理负载就越大。通过网络传输高采样率文件需要更多时间。以 192 kHz/24 位录制的文件大约是以 44.1 kHz/16 位录制的文件的九倍大。
We all want cameras with high resolution. Should we not also want high-resolution audio recordings? Higher sampling rates increase the bandwidth of recording, which means better audio quality. However, they also produce much larger file sizes. The more samples in a file, the greater the processing load. Transmitting a high sample rate file over a network takes more time. A file recorded at 192 kHz/24 bits is about nine times larger than one recorded at 44.1 kHz/16 bits.
观察分辨率的一种方法是观察采样过程对模拟脉冲的影响。图 3.8绘制了这些影响。可以看到,以 48 kHz 的频率录制会使原始模拟瞬态在时间上变得模糊,就像失焦了一样。
One way to view resolution is to see the effect of the sampling process on an analog impulse. Figure 3.8 plots these effects. See how recording at 48 kHz blurs in time the original analog transient, as though it is out of focus.
图 3.8 模拟脉冲(左)的脉冲响应(从左到右),分别由 48 kHz、96 kHz、192 kHz 和直接数字流 (DSD) 录音系统录制。DSD 将在第 4 章中讨论。
Figure 3.8 Impulse responses (left to right) of an analog impulse (left) as recorded by 48 kHz, 96 kHz, 192 kHz, and direct digital stream (DSD) recording systems. DSD is discussed in chapter 4.
有人认为更高采样率的合理性在于,有些人听到的信息(称为空气声)在人类听觉极限 20 kHz 左右的区域(Neve 1992)。许多模拟系统可以重现超高频。像 Sanken CO-100K 这样的麦克风可以录制高达 100 kHz 的声音。一些麦克风前置放大器的带宽甚至远远超过 200 kHz。
One justification that has been given for higher sampling rates is that some people hear information (referred to as air) in the region around the 20 kHz limit of human hearing (Neve 1992). Many analog systems can reproduce ultrahigh frequencies. Microphones like the Sanken CO-100K can record sounds up to 100 kHz. Some microphone preamplifiers have a bandwidth that extends far beyond 200 kHz.
科学实验从生理和主观角度证实了 22 kHz 以上声音的影响(Oohashi 等人,1991;Oohashi 等人,1993)。然而,Melchior(2019)指出,高分辨率录音的音质提升,即使高频听力不佳的人也能听得懂。她观察到,砖墙式滤波器固有的、低于 50 kHz 的采样率带来的时间模糊、前回声和振铃效应是造成这些影响的因素,并列举了几种可以缓解这些现象的高分辨率方案。
Scientific experiments confirm the effects of sounds above 22 kHz from both physiological and subjective viewpoints (Oohashi et al. 1991; Oohashi et al. 1993). Melchior (2019), however, points out how the improved quality of high-resolution recordings can be heard by people without extraordinary high-frequency hearing. She observes that the temporal blur, pre-echo, and ringing inherent in brick-wall filters associated with sampling rates below 50 kHz are contributing factors, and she lists several high-resolution schemes that alleviate these symptoms.
高分辨率的另一个论点是由于存在更多高频空间信息而产生的更集中的空间成像。
Another argument for high resolution is the more focused spatial imaging that accrues due to the presence of more high-frequency spatial information.
在声音合成应用中,44.1 和 48 kHz 标准采样率缺乏频率余量是造成严重问题的根源条件。为了避免混叠,它要求合成算法只能生成高于 11.025 kHz(44.1 kHz 采样率)或 12 kHz(48 kHz 采样率)的正弦波。原因是任何非正弦周期波形都可能包含超过奈奎斯特频率的分音。
In sound synthesis applications, the lack of frequency headroom in standard sampling rates of 44.1 and 48 kHz is a source of serious problems. It requires that synthesis algorithms generate nothing other than sine waves above 11.025 kHz (44.1 kHz sampling rate) or 12 kHz (48 kHz sampling rate) in order to avoid aliasing. The reason is that any nonsinusoidal periodic waveform can have partials that exceed the Nyquist rate.
在采样和音高变换应用中(参见第31章),由于频率余量不足,采样声音在音高向上变换之前需要进行低通滤波,以避免混叠。当以44.1 kHz或48 kHz录制的音调向下变换音高时,它们会变得低沉,并丢失高频内容,因为奈奎斯特频率以上的所有声音都已被抗混叠滤波器消除。在高采样率的录音中,音高向下变换并不一定意味着高频内容的丢失,因为超声波能量会被转置到高频音频范围内。
In sampling and pitch-shifting applications (see chapter 31), the lack of frequency headroom requires that sampled sounds be lowpass filtered before they are pitch-shifted upward to avoid aliasing. When tones recorded at 44.1 or 48 kHz are pitch-shifted downward, they become muffled and lose their high-frequency content because everything above the Nyquist frequency has already been eliminated by the antialiasing filter. In a recording made at a high sampling rate, pitch-shifting downward does not necessarily mean a loss of high frequency content, because ultrasonic energy is transposed into the high-frequency audio range.
从音频角度来看,高采样率录音固然可取,但高质量的播放系统才能让这一切努力物有所值。许多人用连接移动设备的低质量耳机听音乐,这浪费了高分辨率录音的宝贵资源。
High sampling rate recordings are preferable from an audio standpoint, but one needs a high-quality playback system to make the effort worthwhile. Many people listen to music on low-quality ear buds connected to mobile devices, where the luxury of high-resolution recordings is wasted.
抖动是采样过程中基于时间的误差。如果驱动 ADC 或 DAC 的时钟不稳定,转换将无法在正确的时间进行。抖动的典型听觉效应是信号中出现高频咔嗒声。
Jitter is time-based error in sampling. If the clock driving the ADC or DAC is not stable, then the conversions will not happen at the correct times. A typical audible effect of jitter is a high-frequency clicking noise added to the signal.
在家庭录音室中,抖动不太可能成为问题。当多个物理设备互连时,抖动的可能性会增加。抖动可能由多种因素造成:时钟不稳定、线缆/连接器质量差、阻抗不匹配或软件问题。因此,抖动是复杂数字音频系统中的一个实际问题。解决抖动问题的一般方法是依靠一个主字时钟发生器设备,该设备通过高质量的线缆和连接器与所有其他设备互连。
In a home studio, jitter is not likely to be an issue. The possibility of jitter increases when multiple physical devices are interconnected. Jitter can be a product of multiple factors: clock instability, poor cable/connector quality, impedance mismatches, or software issues. Thus jitter is a practical issue in complex digital audio systems. The general solution to jitter problems is to rely on a master wordclock generator device that is interconnected to all other devices via high-quality cables and connectors.
采样率决定了数字系统测量连续信号的频率。下一章将讨论量化,即数字系统测量信号幅度的精确度。采样和量化这两个过程共同构成了数字音频理论的核心。
The sampling rate determines how often a digital system measures a continuous signal in time. The next chapter presents the topic of quantization, that is, how precisely a digital system measures a signal in amplitude. Taken together, these two processes, sampling and quantization, constitute the core of digital audio theory.
Signal-to-Noise Ratio and Dynamic Range
Low-Level Quantization Noise and Dither
Digital Audio Media and Formats
Lossless versus Lossy File Formats
本章将探讨数字音频录制中声音幅度的概念。之后,我们将探讨动态范围、量化、过采样以及数字音频媒体和格式等技术问题。
This chapter examines the notion of sound magnitude as it pertains to digital audio recording. We then look at the technical issues surrounding dynamic range, quantization, oversampling, and digital audio media and formats.
正如第二章所指出的,每个人对声音的强度或大小都有一个直观的概念。科学家们发明了几十个术语来描述声音的强度。其中包括:
As pointed out in chapter 2, everyone has an intuitive notion of sound level or magnitude. Dozens of terms have been devised by scientists to describe the magnitude of a sound. Among many are the following:
从科学角度来看,这些术语各不相同。但从常识角度来看,这些术语彼此关联且成比例:一个术语的显著增强对应着所有术语的增强。我们的耳朵对声音的强度有着敏锐的感知,因此这个概念是物理的,可以直接感知的。
From a scientific point of view, these terms are all different. From a common sense point of view, the terms are all correlated and proportional to one another: a significant boost in one corresponds to a boost in all. Our ears are sharply attuned to sound magnitude, so the concept is physical and directly perceivable.
从音乐角度来看,最有用的科学术语是峰峰值和均方根振幅(在声音编辑器中可以看到)、增益(用于增强或衰减声音的标准术语)、声压级(声级计在空气中测量的值)和响度(感知到的幅度)。物理学家使用声能、声功率和声强来描述声级相对于所做功(即振动介质所需的能量)的测量值。
From a musical point of view, the most useful scientific terms are peak-to-peak and RMS amplitude (as seen in a sound editor), gain (a standard term for boosting or attenuating a sound), sound pressure level (what a sound-level meter measures in the air), and loudness (perceived magnitude). Sound energy, power, and intensity are terms used by physicists to describe measures of sound magnitude relative to the amount of work done, that is, how much energy it takes to vibrate a medium.
表 4.1总结了这些术语的正式定义。图 4.1展示了三个最重要的测量指标:峰值、峰峰值和均方根幅度。本节的其余部分将解释分贝这个有用的概念。
Table 4.1 summarizes the formal definitions of these terms. Figure 4.1 illustrates three of the most important measures: peak, peak-to-peak, and RMS amplitude. The rest of this section explains the useful concept of decibels.
|
表 4.1 测量声音强度的单位 Table 4.1 Units for measuring sound magnitude |
||
|---|---|---|
|
峰峰值幅度 Peak-to-peak amplitude |
测量波形峰峰值之差,以百分比或分贝 (dB) 表示。尤其适用于描述周期性波形的幅度。 A measure of the peak-to-peak difference in waveform values expressed as a percentage or as decibels (dB). Useful for describing the magnitude of periodic waveforms in particular. |
|
|
均方根幅度 RMS amplitude |
对于噪声等复杂信号,均方根 (RMS) 幅度描述的是波形的平均功率。RMS 幅度是波形与静止位置的垂直距离平方的平均值随时间变化的平方根。 For complex signals such as noise, root mean squared (RMS) amplitude describes the average power of the waveform. RMS amplitude is the square root of the mean over time of the square of the vertical distance of the waveform from the rest position. |
|
|
获得 Gain |
测量过程输入与输出幅度(或功率)之比,通常以分贝为单位。大于 1 dB 的增益为增强,小于 1 dB 的增益为衰减。 A measure of the ratio of the input and the output amplitude (or power) of a process, usually measured in decibels. A gain of greater than 1 dB is a boost, and a gain of less than 1 dB corresponds to attenuation. |
|
|
声能 Sound energy |
声能是功的量度,指振动介质的能力,以焦耳表示。焦耳是能量单位,相当于1牛顿的力穿过1米距离所做的功。牛顿等于使1千克的质量产生1米/平方秒的加速度所需的力。 A measure of work, sound energy is the ability to vibrate a medium, expressed in joules. A joule is a unit of energy corresponding to the work done by a force of 1 newton traveling through a distance of 1 meter. A newton is equal to the amount of force required to give a mass of 1 kilogram an acceleration of 1 meter per second squared |
|
|
声功率 Sound power |
做功或消耗能量的速率。功率的标准单位是瓦特,相当于每秒1焦耳。1瓦特等于物体以每秒1米的速度移动,对抗1牛顿的力时所做的功的速率。 The rate at which work is done or energy is used. The standard unit of power is the watt, corresponding to 1 joule per second. 1 watt is the rate at which work is done when an object is moving at 1 meter per second against a force of 1 newton. |
|
|
声强 Sound intensity |
单位面积的声功率,以瓦特/平方米为单位。 Sound power per unit area, measured in watts per square meter. |
|
|
声压级(SPL) Sound pressure level (SPL) |
特定点的气压,以分贝为单位,是声压与20微帕参考声压之比。帕斯卡是压力单位,相当于每平方米1牛顿的力。 Air pressure at a particular point, given in decibels as a ratio of sound pressure to a reference sound pressure of 20 micropascals. A pascal is a unit of pressure equivalent to the force of 1 newton per square meter. |
|
|
响度 Loudness |
基于人类受试者询问的心理声学测量,以方为单位。1 方等于 1 kHz 时的 1 dB SPL。 A psychoacoustic measure based on queries of human subjects, measured in phons. 1 phon equals 1 dB SPL at 1 kHz. |
|
图 4.1 幅度测量。(1)峰值幅度。(2)峰峰值幅度。(3)均方根幅度。
Figure 4.1 Measures of amplitude. (1) Peak amplitude. (2) Peak-to-peak amplitude. (3) RMS amplitude.
耳朵是极其敏感的器官。假设我们坐在扬声器前三米处,扬声器发出1000赫兹的正弦波,我们听起来非常响亮。令人惊讶的是,即使将功率降低一百万倍,音调仍然清晰可闻。在消除所有外部声音的消声室中,功率降低幅度可达十亿倍以上(Backus 1977)。
The ear is an extremely sensitive organ. Suppose that we sit three meters in front of a loudspeaker that is generating a sine tone at 1,000 Hz that we perceive as being very loud. Amazingly, one can reduce the power by a factor of one million and the tone is still audible. In an anechoic chamber where all external sounds are eliminated, the reduction extends to a factor of more than one billion (Backus 1977).
声音传输的是声源振动产生的能量。声能的范围涵盖了从蝴蝶的亚音速振翅到巨大的爆炸等各种声音。耳语声只能产生几十亿分之一瓦的功率。相比之下,一次大型火箭发射可以产生约1000万瓦的功率。
Sound transports energy generated by the vibration of a source. The range of sound energy encompasses everything from the subsonic flutterings of a butterfly to massive explosions. A whisper produces only a few billionths of a watt. By contrast, a large rocket launch generates about 10 million watts of power.
分贝 (dB) 单位通过对数将这些巨大的指数变化压缩到更小的范围内。dB 单位可以应用于无数的物理现象;然而,其定义会根据被测量的现象而变化。音频中的一个标准单位是 dB SPL。它将给定的声压级 (SPL) 与标准参考级进行比较。该比率的对数(以 10 为底)就是分贝级,因此
The decibel (dB) unit compresses these huge exponential variations into a smaller range by means of logarithms. The dB unit can be applied to myriad physical phenomena; however, the definition changes according to the phenomenon being measured. A standard unit in audio is dB SPL. This compares a given sound pressure level (SPL) with a standard reference level. The logarithm (base 10) of this ratio is the level in decibels, hence
声压级(分贝)= 20 log 10 ( W / W 0 )
SPL in decibels = 20 log10 (W/W0)
其中W是被测信号的实际声压级 (SPL),W 0是 20 微帕斯卡气压的标准参考水平。这相当于人类能听到的最小声音。
where W is the actual SPL of the signal being measured, and W0 is a standard reference level of 20 micropascals of air pressure. This corresponds to the quietest sound that a human being can hear.
用 dB 来描述声级可以实现更宽的范围。表 4.2显示了分贝单位如何将百分比幅度的较大变化压缩为 dB 数的相对较小的变化。
Describing sound levels in terms of dB enables a wide range. Table 4.2 shows how the decibel unit compresses large changes in percentage amplitude into relatively small changes in the number of dB.
|
表 4.2 振幅百分比与分贝 Table 4.2 Amplitude as a percentage versus as decibels |
||
|---|---|---|
|
100% 100% |
0分贝 0 dB |
|
|
70% 70% |
− 3 分贝 −3 dB |
|
|
50% 50% |
− 6 分贝 −6 dB |
|
|
25% 25% |
− 12 分贝 −12 dB |
|
|
12.5% 12.5% |
− 18 分贝 −18 dB |
|
|
6.25% 6.25% |
−24分贝 −24 dB |
|
|
3.125% 3.125% |
−30分贝 −30 dB |
|
|
1.562% 1.562% |
−36分贝 −36 dB |
|
|
0.781% 0.781% |
−42分贝 −42 dB |
|
|
0.39% 0.39% |
− 48 分贝 −48 dB |
|
|
0.195% 0.195% |
− 54 分贝 −54 dB |
|
|
0.097% 0.097% |
− 60 分贝 −60 dB |
|
|
0.048% 0.048% |
− 66 分贝 −66 dB |
|
|
0.024% 0.024% |
− 72 分贝 −72 dB |
|
|
0.012% 0.012% |
− 78 分贝 −78 dB |
|
|
0.006% 0.006% |
− 84 分贝 −84 dB |
|
|
0.003% 0.003% |
−90分贝 −90 dB |
|
当我们远离声源时,其声压级 (SPL) 会随着距离的增加而减小。具体来说,距离每增加一倍,声压级 (SPL) 就会降低约 6 dB,这意味着其振幅会减小 50%。这就是著名的平方反比定律:强度与距离的平方成反比。
As we move away from a sound source, its SPL diminishes according to the distance. Specifically, each doubling of distance decreases SPL by about 6 dB, which represents a 50% decrease in its amplitude. This is the famous inverse square law: intensity diminishes as the square of the distance.
到目前为止,我们一直在讨论振幅和声压级 (SPL)。另外一对术语——音量和响度——则比较直观。从技术角度来说,响度是指通过对人类进行心理声学测试测得的主观感知强度,而不是实验室仪器测得的声压级。例如,耳朵对 1,000 Hz 到 4,000 Hz 之间的频率特别敏感。这个范围内的音调听起来比其他频率中同等强度的音调更响亮。因此,响度测量属于心理声学的范畴。为了区分响度级(一种感知特性)和声压级(一种物理特性),使用单位方(与John押韵)。例如,要使声音听起来同样响亮(60 方),大约 30 Hz 的音调需要比 1,000 Hz 的音调增强 40 dB。
So far we have been talking in terms of amplitude and SPL. Another pair of terms—volume and loudness—are intuitive. Technically, loudness refers to perceived subjective intensity measured through psychoacoustic tests on human beings and not to sound pressure level measured by laboratory instruments. For example, the ear is especially sensitive to frequencies between 1,000 Hz and 4,000 Hz. Tones in this region sound louder than tones of equal intensity in other frequencies. Thus the measurement of loudness falls into the realm of psychoacoustics. In order to differentiate loudness level (a perceptual characteristic) from sound pressure level (a physical characteristic), the unit phon (rhymes with John) is used. For example, to sound equally loud (60 phons), a tone at about 30 Hz needs to be boosted 40 dB more than a 1,000 Hz tone.
以离散时间间隔采样是数字信号与模拟信号之间的主要区别之一。另一个区别是量化,即以离散幅度间隔采样。数字信号并非具有无限的精度。它们只能在一定范围内以一定精度表示,而精度会因所用硬件而异。
Sampling at discrete time intervals constitutes one of the major differences between digital and analog signals. Another difference is quantization, which is sampling at discrete amplitude intervals. Digital numbers do not have infinite precision. They can be represented only within a certain range and with a certain accuracy, which varies with the hardware used.
量化是数字音频质量的一个重要因素。具体来说,每个样本的位数(也称为样本宽度、位深度或量化级别)在计算信噪比(SNR)时非常重要。SNR 是衡量系统噪声程度的指标:高 SNR 表示低噪声;低 SNR 表示高噪声。
Quantization is an important factor in digital audio quality. Specifically, the number of bits per sample (also called the sample width, bit depth, or quantization level) is important in calculating the signal-to-noise ratio (SNR). SNR is a measure of the noisiness of a system: a high SNR means low noise; a low SNR means high noise.
信噪比 (SNR) 衡量的是音频信号强度与噪声强度的比率。在音频系统中,信噪比 (SNR) 定义为标准工作电平(通常为音量表上的 0)与平均工作电平之间的差值。噪声基底水平,以分贝表示。一般而言,模数转换器中每增加一位,信噪比(SNR)就会增加约6 dB。
SNR measures the ratio of the strength of an audio signal to the strength of noise. In an audio system, the SNR is specified as the difference in level between the standard operating level (usually 0 on a VU meter) and the average level of the noise floor, expressed in decibels. In general, each additional bit in the analog-to-digital converter will contribute about 6 dB to the SNR.
另一个测量指标是数字音响系统的动态范围(DR)。简单来说,这是系统所能产生的最强声音和最弱声音之间的分贝差。显然,信噪比 (SNR) 和动态范围 (DR) 是相关的;DR 越高,信噪比 (SNR) 也越高。理论上,这很简单。但如果具体规定如何以及使用什么测量单位来计算信噪比 (SNR) 和动态范围 (DR),情况就会变得更加复杂。如上所述,信噪比 (SNR) 是 0 VU 标准工作电平与本底噪声之间的差值。动态范围 (DR) 是本底噪声与失真点之间的差值。因此,动态范围 (DR) 取决于测量失真的方法,而人们对失真的定义各不相同。
Another measurement is the dynamic range (DR) of a digital sound system. In simple terms, this is the difference in dB between the loudest and softest sounds that the system can produce. Clearly, SNR and DR are correlated; when the DR is high, so is the SNR. In theory, this is straightforward. It becomes more complicated when stipulating exactly how and in what units of measurement SNR and DR are calculated. As mentioned, the SNR is the difference between the standard operating level of 0 VU and the noise floor. The DR is the difference between the noise floor and the point of distortion. Thus the DR depends on a method of measuring distortion, which people define in different ways.
为了我们的目的,数字音频系统动态范围的简单公式是
For our purposes, a simple formula for the dynamic range of a digital audio system is
动态范围(分贝)=位数× 6.11
Dynamic range in decibels = Number of bits × 6.11
6.11 这个数字与理论最大值非常接近(van de Plaasche 1983;van de Plaasche 和 Dijkmans 1983;Hauser 1991);实际上,6.0 是一个更现实的数字。Mathews (1969) 和 Blesser (1978) 给出了该公式的推导。
The number 6.11 is a close approximation to the theoretical maximum (van de Plaasche 1983; van de Plaasche and Dijkmans 1983; Hauser 1991); in practice, 6.0 is a more realistic figure. A derivation of this formula is given in Mathews (1969) and Blesser (1978).
因此,如果我们使用 8 位系统录制声音,动态范围 (DR) 的上限约为 48 dB。这与低信噪比 (SNR) 相关,并且会产生明显的噪声。但如果我们以 16 位/样本录制,动态范围将提升至最高 96 dB,这是一个重大改进。16 位音频成为了光盘的标准。20 位转换器可提供 120 dB 的潜在动态范围,大致相当于人耳的听觉范围。
Thus, if we record sound with an 8-bit system, then the upper limit on the DR is approximately 48 dB. This is correlated with a low SNR and is audibly noisy. But if we record 16 bits per sample, the dynamic range increases to a maximum of 96 dB—a major improvement. 16-bit audio became the standard of the compact disc. A 20-bit converter offers a potential DR of 120 dB, which corresponds roughly to the range of the human ear.
本讨论假设我们使用线性PCM方案,该方案将每个样本存储为一个表示其值的整数。Blesser (1978)、Moorer (1979b) 和 Pohlmann (2010) 已经探讨了其他编码方案的影响。一些编码方案(例如,稍后讨论的MP3)的目标是减少系统存储或传输的总比特数。
This discussion assumes that we are using a linear PCM scheme that stores each sample as an integer representing the value of each sample. Blesser (1978), Moorer (1979b), and Pohlmann (2010) have reviewed the implications of other encoding schemes. Some encoding schemes (e.g., MP3, discussed later) have the goal of reducing the total number of bits that the system stores or transmits.
样本通常用整数表示。例如,如果输入信号的电压对应于 53 到 54 之间的样本值,那么转换器可能会将其四舍五入并赋值为 53。一般来说,对于每个样本,样本值通常与原始信号值略有不同。数字信号中的这种问题被称为量化误差或量化噪声(Blesser 1978;Maher 1992;Lipshitz、Wannamaker 和 Vanderkooy 1992;Pohlmann 2010)。
Samples are often represented as integers. If the input signal has a voltage corresponding to a sample value between 53 and 54, for example, then the converter might round it off and assign a value of 53. In general, for each sample taken, the value of the sample usually differs slightly from the value of the original signal. This problem in digital signals is known as quantization error or quantization noise (Blesser 1978; Maher 1992; Lipshitz, Wannamaker, and Vanderkooy 1992; Pohlmann 2010).
图 4.2显示了可能发生的量化误差类型。当输入信号像交响乐一样复杂时,我们只听误差(如图 4.2底部所示),听起来就像噪音。如果误差很大,那么我们可能会在系统输出端听到类似于模拟磁带的嘶嘶声。
Figure 4.2 shows the kinds of quantization errors that can occur. When the input signal is something complicated like a symphony, and we listen to just the errors, shown at the bottom of figure 4.2, it sounds like noise. If the errors are large, then one might notice something similar to analog tape hiss at the output of a system.
图 4.2 量化效应。(a) 模拟波形。(b) (a) 中波形的采样版本。每个样本只能分配特定的值,这些值由左侧的短水平虚线表示。(c) 显示了每个样本与原始信号之间的差异,其中每个条形的高度代表量化误差。
Figure 4.2 Effects of quantization. (a) Analog waveform. (b) Sampled version of the waveform in (a). Each sample can be assigned only certain values, which are indicated by the short horizontal dashes at the left. The difference between each sample and the original signal is shown in (c), where the height of each bar represents the quantization error.
量化噪声取决于两个因素:输入信号本身,以及信号以数字形式表示的精度。我们可以通过以下例子来解释输入信号对噪声的敏感性:在模拟磁带录音机上,磁带会产生一种轻微的噪声晕,即使在录制的静音时段内也会持续存在。但在数字系统中,当没有录制任何内容(或静音)时,不存在量化噪声。换句话说,如果如果输入信号为静默信号,则该信号由一系列样本表示,每个样本恰好为零。对于这样的信号,图 4.2c中显示的细微差异消失了,这意味着量化噪声消失了。然而,如果输入信号是纯正弦波,则量化误差不是随机函数,而是一种确定性的截断效应,在低电平时可能会听起来很刺耳(Maher 1992;Stuart and Craven 2019)。我们将在抖动部分进一步讨论这一点。
The quantization noise is dependent on two factors: the input signal itself, and the accuracy with which the signal is represented in digital form. We can explain the sensitivity to noise in the input signal by noting that on an analog tape recorder, tape imposes a soft halo of noise that continues even through periods of recorded silence. But in a digital system there is no quantization noise when nothing (or silence) is recorded. In other words, if the input signal is silence, then the signal is represented by a series of samples, each of which is exactly zero. The small differences shown in figure 4.2c disappear for such a signal, which means that the quantization noise disappears. If the input signal is a pure sinusoid, however, then the quantization error is not a random function but a deterministic truncation effect that can be audibly gritty at low levels (Maher 1992; Stuart and Craven 2019). We discuss this further in the section on dithering.
量化噪声的第二个因素是数字表示的精度。在用整数表示每个样本值的线性PCM系统中,量化噪声与用于表示样本的位数直接相关。如前所述,这就是系统的量化级别。图4.3展示了不同量化级别的影响,比较了1位和4位量化的分辨率。通常,在线性PCM系统中,用于表示样本的位数越多,量化噪声越小。
The second factor in quantization noise is the accuracy of the digital representation. In a linear PCM system that represents each sample value by an integer, quantization noise is directly tied to the number of bits that are used to represent a sample. As previously noted, this is the quantization level of a system. Figure 4.3 illustrates the effects of different quantization levels, comparing the resolution of 1-bit versus 4-bit quantization. In a linear PCM system generally, the more bits used to represent a sample, the less the quantization noise.
图 4.3 比较 4 位量化与 1 位量化的精度。细圆曲线为输入波形。(a) 1 位量化提供两种级别的幅度分辨率。(b) 4 位量化提供十六种不同级别的幅度分辨率。
Figure 4.3 Comparing the accuracy of 4-bit quantization with that of 1-bit quantization. The thin rounded curve is the input waveform. (a) 1-bit quantization provides two levels of amplitude resolution. (b) 4-bit quantization provides sixteen different levels of amplitude resolution.
图 4.4展示了通过增加分辨率位数(从 2^ 4(16 个可能值)到 2^ 8 (256 个可能值))实现的正弦波精度提升。考虑使用 2^ 24(即超过 1670 万个可能值)的 24 位样本所带来的精度提升。
Figure 4.4 shows the improvement in sine wave accuracy achieved by adding more bits of resolution, going from 24 (sixteen possible values) to 28 (256 possible values). Consider the improvement in precision accrued by using a 24-bit sample with 224 or more than 16.7 million possible values.
图 4.4 量化对正弦波平滑度的影响。(a)具有十个量化级别的正弦波,对应于 4 位系统发出的中等音量音调。(b)8 位系统发出的更平滑的正弦波。
Figure 4.4 Effect of quantization on sine wave smoothness. (a) Sine wave with ten levels of quantization, corresponding to a moderately loud tone emitted by a 4-bit system. (b) Smoother sinusoid emitted by an 8-bit system.
采样过程可以看作是将波形拟合到时间与幅度的网格中,如图4.5所示。通常,网格越精细,与原始波形的近似度就越高。更具体地说,时间网格(或采样率)越精细,带宽就越大。幅度网格(或量化级别)越精细,动态范围就越大,噪声量就越小。
The process of sampling can be viewed as fitting a waveform to a grid of time versus amplitude, as shown in figure 4.5. In general, the finer the grid, the better the approximation to the original waveform. More specifically, the finer the time grid (or sampling rate), the greater the bandwidth. The finer the amplitude grid (or quantization level), the greater the dynamic range and the smaller the amount of noise.
图 4.5 采样网格。横轴表示时间,纵轴表示振幅。(a) 低采样率和低量化导致的正弦波形的粗略近似。(b) 提高网格分辨率可以更好地近似波形。网格分辨率越高,波形越接近原始波形。
Figure 4.5 The sampling grid. The horizontal axis is time. The vertical axis is amplitude. (a) Crude approximation of a sine waveform caused by low sampling rate and low quantization. (b) Increasing grid resolution results in a better approximation to the waveform. Greater increases in grid resolution would closely approximate the original waveform.
虽然数字系统在没有输入信号时不会出现噪声,但在非常低(但非零)的信号电平下,量化噪声会呈现出有害的形式。这种粗糙的声音被称为颗粒噪声或调制噪声,当低电平音调衰减至无声时可以听到。非常低电平的信号只会在最低位触发变化。这些1位变化看起来像一个方波,它富含奇次谐波。以钢琴音调的衰减为例,它会平滑地衰减,高音部分会滚降——直到最低电平,它才会改变特性,变成听起来刺耳的方波。方波的谐波甚至可以超出奈奎斯特频率,从而导致混叠并引入原始信号中没有的新频率成分。如果信号保持在较低的监听电平,这些伪像或许可以忽略,但如果信号在高电平听到,或者被重新调整到更高的电平(这是音频处理中的常见做法),在电子音乐中,这一点变得更加明显。因此,在输入阶段尽可能准确地量化信号非常重要。
Although a digital system exhibits no noise when there is no input signal, at very low (but nonzero) signal levels, quantization noise takes a pernicious form. This gritty sound, called granulation noise or modulation noise, can be heard when low-level tones decay to silence. A very low-level signal triggers variations only in the lowest bit. These 1-bit variations look like a square wave, which is rich in odd harmonics. Consider the decay of a piano tone, which smoothly attenuates with high partials rolling off—right until the lowest level when it changes character and becomes a harsh-sounding square wave. The harmonics of the square wave can extend even beyond the Nyquist frequency, causing aliasing and introducing new frequency components that were not in the original signal. These artifacts may be possible to ignore if the signal is kept at a low monitoring level, but if the signal is heard at a high level or if it is rescaled to a higher level (a common practice in electronic music), it becomes more obvious. Hence it is important that the signal be quantized as accurately as possible at the input stage.
为了解决低电平量化问题,一些数字录音系统采取了一种看似奇怪的做法。它们在模数转换之前向信号中引入少量不相关的噪声——称为抖动(Vanderkooy 和 Lipshitz,1984 年;Lipshitz 等人,1992 年;Stuart 和 Craven,2019 年)。这会导致 ADC 围绕低电平信号进行随机变化,从而平滑方波谐波的有害影响(图 4.6)。抖动会将通常与信号相关的量化误差转化为与信号不相关的宽带噪声。对于像前面提到的钢琴音调这样的渐弱音,其效果就像软着陆一样,音调平滑地淡入基调。低水平随机噪声。添加的噪声量通常在 3 dB 左右,但耳朵可以重建幅度低于抖动信号的音乐音调。
To confront low-level quantization problems, some digital recording systems take a seemingly strange action. They introduce a small amount of uncorrelated noise—called dither—to the signal prior to analog-to-digital conversion (Vanderkooy and Lipshitz 1984; Lipshitz et al. 1992; Stuart and Craven 2019). This causes the ADC to make random variations around the low-level signal, which smooths out the pernicious effects of square wave harmonics (figure 4.6). With dither, the quantization error, which is usually signal-dependent, is turned into a wideband noise that is uncorrelated with the signal. For decrescendos like the piano tone mentioned previously, the effect is that of a soft landing as the tone fades smoothly into a bed of low-level random noise. The amount of added noise is usually on the order of 3 dB, but the ear can reconstruct musical tones whose amplitudes fall below that of the dither signal.
图 4.6 抖动减少谐波失真。(顶部)原始信号。(底部)抖动后信号。
Figure 4.6 Dither reduces harmonic distortion. (Top) Original signal. (Bottom) Postdithered signal.
在 16 位量化级别录音时,建议使用抖动。使用 24 位转换器录音时,可能无需添加抖动,因为低位代表极其柔和的信号。但例如,在将信号从 24 位重新量化为 16 位格式时,建议使用抖动以保持信号保真度 (Stuart and Craven 2019)。抖动算法有很多,因此声音编辑器和母带处理插件通常会提供多种选项。
Dithering is recommended in recording at a 16-bit quantization level. Adding dither may not be necessary in recording with a 24-bit converter because the low bit represents an extremely soft signal. But in requantizing signals from a 24-bit to a 16-bit format, for example, dithering is recommended to preserve signal fidelity (Stuart and Craven 2019). Many dithering algorithms exist, so sound editors and mastering plug-ins often provide a variety of options.
音频转换器会导致各种失真(Blesser 1978;McGill 1985;Talambiras 1985;Pohlmann 2010)。其中一个问题是,n位转换器不一定能精确到其n位输入或输出所隐含的整个动态范围。虽然n位转换器的分辨率是2n分之一,但转换器的线性度是指模拟和数字输入输出信号在幅度方面的匹配程度。也就是说,有些转换器使用 2n个步长,但这些步长并不是线性分布的,这会导致失真。因此,遇到一个 24 位转换器实际上是 19 位线性转换器的情况并不罕见。有关这些问题的讨论,请参阅 Pohlmann (2010)。
Audio converters can cause a variety of distortions (Blesser 1978; McGill 1985; Talambiras 1985; Pohlmann 2010). One such problem is that an n-bit converter is not necessarily accurate to the full dynamic range implied by its n-bit input or output. Although the resolution of an n-bit converter is one part in 2n, a converter’s linearity is the degree to which the analog and digital input and output signals match in terms of their magnitudes. That is, some converters use 2n steps, but these steps are not linearly spaced, which causes distortion. Hence it is not unusual to encounter a 24-bit converter, for example, that is actually 19 bits linear. Refer to Pohlmann (2010) for a discussion of these issues.
在我们描述的数字音频系统中,多位线性 DAC一步将二进制样本值转换为模拟电压。也就是说,它们在每个采样周期内转换 16 到 24 位的样本。相应地,多位线性 ADC 执行相反的操作:将模拟电压转换为多位样本。
In the digital audio systems that we have described, multibit linear DACs transform a binary sample value into an analog voltage in one step. That is, they convert a sample of 16 to 24 bits at each sample period. Correspondingly, multibit linear ADCs perform the inverse operation: converting an analog voltage into a multibit sample.
相比之下,过采样转换器在转换阶段使用的样本比实际存储在记录介质中的样本要多。就我们的目的而言,介绍基本概念就足够了,为那些希望进一步研究该主题的人留下参考。
By contrast, oversampling converters use more samples in the conversion stage than are actually stored in the recording medium. For our purposes it is sufficient to present the basic ideas, leaving references for those who wish to investigate the topic further.
过采样是一类用于提高转换器精度的方法。大多数方法依赖于1位过采样(Adams 1990;Hauser 1991;Reiss 2008)。这些方法以较高的采样频率,每次仅转换一位。
Oversampling is a family of methods for increasing the accuracy of converters. Most methods rely on 1-bit oversampling (Adams 1990; Hauser 1991; Reiss 2008). These methods convert just one bit at a time at a high sampling frequency.
1 位过采样转换器的理论可以追溯到 20 世纪 50 年代(Cutler,1960),但这项技术却花了数年时间才融入数字音频系统。1 位过采样转换器构成了一系列不同的根据制造商的不同,这些技术也被称为sigma-delta、delta-sigma、噪声整形、比特流、MASH或直接数字流 (DSD)。在手机和大功率扩声系统中流行的 D 类放大器就采用了这项技术 (Reiss 2008)。
The theory of 1-bit oversampling converters goes back to the 1950s (Cutler 1960), but it took years for this technology to become incorporated into digital audio systems. The 1-bit oversampling converters constitute a family of different techniques that are variously called sigma-delta, delta-sigma, noise shaping, bitstream, MASH, or direct stream digital (DSD), depending on the manufacturer. Class D amplifiers, popular in mobile phones and high-power sound reinforcement systems, incorporate this technology (Reiss 2008).
它们有一个共同点,即以高采样频率一次采样一位。也就是说,这些转换器不是试图在单个样本中表示整个波形,而是测量连续样本之间的差异。它们只需要测量波形自上次采样以来是发生了正向偏移还是负向偏移,但由于它们测量频率很高,波形没有时间发生较大的偏移,因此每个采样周期 1 位信息就足够了。
They have the common thread that they sample one bit at a time at high sampling frequencies. That is, rather than trying to represent the entire waveform in a single sample, these converters measure differences between successive samples. They need to measure only whether the waveform has made a positive or negative excursion since it was last sampled, but they do this so frequently that the waveform does not have time to make a large excursion, and so 1 bit of information per sample period is sufficient.
1 位转换器利用了信息论的一条基本定律(Shannon 和 Weaver,1949),该定律认为,人们可以用采样宽度换取采样率,同时仍能保持相同的转换分辨率。也就是说,一个以存储采样率 16 倍过采样的 1 位转换器,其性能与一个没有过采样的 16 位转换器相当。它们处理的位数相同。当采样率高得多时,过采样的优势会显现出来,这意味着处理的位数大于输入位数。
1-bit converters take advantage of a fundamental law of information theory (Shannon and Weaver 1949), which says that one can trade off sample width for sample rate and still convert at the same resolution. That is, a 1-bit converter that oversamples at sixteen times the stored sample rate is equivalent to a 16-bit converter with no oversampling. They both process the same number of bits. The benefits of oversampling accrue when the sampling rate is much higher, meaning that the number of bits being processed is greater than the number of input bits.
描述采样系统的一种方法是根据关系确定正在处理的总位数
One way to describe a sampling system is to determine the total number of bits being processed, according to the relation
过采样因子×转换器宽度
Oversampling factor × Width of converter
例如,一个使用 1 位转换器的 128 倍过采样系统每个采样周期处理 128 × 1 位数据。相比之下,传统的 16 位线性转换器只能处理 1 × 16 位数据,即八分之一的数据。理论上,过采样的 1 位转换器的声音应该更清晰。
For example, a 128-times oversampling system that uses a 1-bit converter is processing 128 × 1 bits each sample period. This compares to a traditional 16-bit linear converter that handles 1 × 16 bits, or one-eighth the data. In theory, the oversampling 1-bit converter should be much cleaner sounding.
无论如何,过采样的所有优势都归功于1位转换器,包括数字滤波带来的分辨率和相位线性度的提升。多位转换器技术难以实现的高采样率,用1位转换器更容易实现。2.8224 MHz及更高频率范围的过采样率允许每个样本进行高精度量化(Moorer,1996)。瑞士的Pyramix数字音频工作站支持高分辨率DSD录音。DSD已发展成为面向发烧友的高分辨率下载和流媒体格式,一些转换器支持超过22 MHz的过采样率(Melchior,2019)。
In any case, all the benefits of oversampling accrue to 1-bit converters, including increased resolution and phase linearity due to digital filtering. High sampling rates that are difficult to achieve with the technology of multibit converters are much easier to implement with 1-bit converters. Oversampling rates in the 2.8224 MHz and greater range permit highly accurate quantization per sample (Moorer 1996). The Swiss-based Pyramix digital audio workstation supports high-resolution DSD recording. DSD has evolved into a high-resolution download and streaming format for audiophiles, with some converters supporting oversampling rates in excess of 22 MHz (Melchior 2019).
1 位过采样转换器中常用的另一种技术是噪声整形,它可以有多种形式(Hauser 1991;Reiss 2008)。基本其原理是,过采样过程中产生的重新量化误差会被一个高通滤波器(该滤波器与输入信号/声道信号)移至高频范围,要么移至不太明显的频带,要么移至音频带宽之外。该噪声整形环路仅将重新量化误差(而非音频信号)发送至高通滤波器。
Another technique commonly used in 1-bit oversampling converters is noise shaping, which can take many forms (Hauser 1991; Reiss 2008). The basic idea is that the requantization error that occurs in the oversampling process is shifted into a high-frequency range, either to a less noticeable frequency band or out of the audio bandwidth, by a highpass filter in a feedback loop with the input sign/al. This noise-shaping loop sends only the requantization error through the highpass filter and not the audio signal.
任何过采样转换器的最后一级都是一个抽取器,它将信号的采样率降低到存储(对于 ADC)或播放(对于 DAC)所需的采样率,并对信号进行低通滤波。对于噪声整形转换器,这还会消除重新量化噪声,从而显著提高信噪比。使用二阶噪声整形(因反馈环路中使用二阶高通滤波器而得名)时,1 位转换器的最大信噪比大约相当于每倍频程过采样 15 dB(2.5 位),减去固定的 12.9 dB 损失(Hauser 1991)。通过以 256 倍目标采样率运行 20 位转换器,理论上可以实现 24 位分辨率。实际上,1 位过采样中的许多问题都是由非线性量化器周围的反馈引起的。这些包括极限循环、空闲音、失真、死区、不稳定和其他可怕的现象(Reiss 2008)。
The final stage of any oversampling converter is a decimator that reduces the sampling rate of the signal to that required for storage (for an ADC) or playback (for a DAC) and also lowpass filters the signal. In the case of a noise shaping converter this also removes the requantization noise, resulting in dramatic improvements in signal-to-noise ratio. With second-order noise shaping (so called because of the second-order highpass filter used in the feedback loop), the maximum signal-to-noise level of a 1-bit converter is approximately equivalent to 15 dB (2.5 bits) per octave of oversampling, minus a fixed 12.9 dB penalty (Hauser 1991). By running a 20-bit converter at 256 times the target sampling rate, one achieves 24-bit resolution, in theory. In practice, a host of issues in 1-bit oversampling are caused by feedback around a nonlinear quantizer. These include limits cycles, idle tones, distortion, dead zones, instability, and other scary phenomena (Reiss 2008).
第 9 章介绍了采样率转换相关问题的基础知识,该问题在多速率信号处理的工程文本中进行了讨论(Mitra 2006)。
Chapter 9 presents the basics on the related issue of sample-rate conversion, which is discussed in engineering texts in the context of multirate signal processing (Mitra 2006).
音频样本可以存储在任何数字介质上,通常是光盘、磁盘、固态硬盘和存储芯片。在给定的存储介质上,数据文件可以以多种音频文件格式存储。执行音频数据编码和解码的软件称为编解码器(编解码器)。
Audio samples can be stored on any digital medium, typically optical or magnetic disks, solid-state drives, and memory chips. On a given storage medium, data files can be stored in a variety of audio file formats. The software that performs the encoding and decoding of the audio data is called a codec (coder-decoder).
像 AIFF 或 WAVE 这样的音频文件格式是一种数据结构,它将文件划分为多个子部分,每个部分包含特定类型的数据。例如,一个部分规定了采样率、位分辨率以及文件中存储的数字音频通道数。另一个部分包含原始样本数据。其他部分可以包含指向标记、循环点以及其他类型信息(例如文件名、作者、版权等)的指针。
An audio file format like AIFF or WAVE is a data structure that divides the file into subsections, each of which contains a specific type of data. For example, one section stipulates the sample rate, bit resolution, and how many channels of digital audio are stored in the file. Another section contains the raw sample data. Other sections can contain pointers to markers, loop points, and other kinds of information such as file name, author, copyright, and so on.
新的媒体、编解码器和格式正在不断发展。科学进步推动着这些技术的发展,但新媒体、编解码器或格式的引入往往是由商业策略驱动的。
New media, codecs, and formats are constantly being developed. Scientific advances push these technologies, but just as often the introduction of a new medium, codec, or format is driven by a commercial strategy.
常见的文件格式有三种:
Three types of file formats are common:
1. 未压缩
1. Uncompressed
2. 无损压缩
2. Compressed lossless
3. 有损压缩
3. Compressed lossy
未压缩格式(例如 AIFF 或 WAV)存储全分辨率 PCM 数字音频波形,不进行任何数据缩减。这包括 16 位和 24 位格式,采样率为 22.05 至 384 kHz。一种特殊的未压缩格式是基于 IEEE 754 标准的 32 位浮点(FP)。该格式将音频编码为 24 位尾数,可按 8 位指数缩放,从而产生约 1,500 dB 的有效动态范围。Zoom 和 Sound Devices 等公司生产的现场录音机支持 FP 录音。Audacity、Adobe Audition 和 Reaper 等音频编辑器也支持 FP 格式。
An uncompressed format (e.g., AIFF or WAV) stores full resolution PCM digital audio waveforms without any data reduction. This includes 16-bit and 24-bit formats at sample rates from 22.05 to 384 kHz. A special uncompressed format is 32-bit floating-point (FP), based on the IEEE 754 standard. This format encodes audio as a 24-bit mantissa that can be scaled by an 8-bit exponent, resulting in an effective dynamic range of about 1,500 dB. FP recording is provided in field recorders made by companies such as Zoom and Sound Devices. Audio editors such as Audacity, Adobe Audition, and Reaper also support the FP format.
无损格式的优势在于,它可以通过数字输入/输出连接器(播放和录音系统上的硬件插孔)以及标准数字音频传输格式(例如 AES3(双通道)、S/PDIF(双通道)、AES10 或 MADI(56 通道)、以太网 802.1 视听桥接 (AVB)(200 通道)和 Dante(1024 通道))实时地将比特从一个介质传输到另一个介质。Johns(2017)解释了以太网音频格式的普及。
An advantage of lossless formats is that one can transfer the bits from one medium to another in real time through digital input/output connectors (hardware jacks on the playback and recording systems) and standard digital audio transmission formats such as AES3 (two channels), S/PDIF (two channels), AES10 or MADI (56 channels), Ethernet 802.1 audio visual bridging (AVB) (200 channels), and Dante (1024 channels). Johns (2017) explains the proliferation of Ethernet audio formats.
压缩无损格式,例如免费无损音频编解码器 (FLAC) 或 Apple 无损 (ALAC),利用数据中的冗余信息来更高效地打包数据,且不会丢失数据。然而,文件需要先解压才能播放。解压后,原始波形可以完美地重建。常见的 ZIP 和 RAR 文件格式就是压缩无损格式的例子,但由于它们并未针对音频进行优化,因此文件大小通常可以减少 20%。相比之下,像 FLAC 这样的音频专用格式可以将文件大小减少多达 50%。
A compressed lossless format, such as free lossless audio codec (FLAC) or Apple lossless (ALAC), takes advantage of redundancies in the data to pack it more efficiently without losing data. However, the file needs to be unpacked before it can be played. The original waveform can be perfectly reconstructed after it is unpacked. The ubiquitous ZIP and RAR file formats are examples of compressed lossless formats, but because they are not optimized for audio they typically achieve a 20 percent reduction in file size. By contrast, an audio-specific format like FLAC can achieve as much as a 50 percent reduction in file size.
有损压缩格式(例如 MP3、Vorbis、高级音频编解码器 (AAC)、ATRAC 或 Windows Media Audio)会分析输入文件,以确定为了满足所需的最小比特率而可以丢弃哪些信息。其目标不仅是减小文件大小,还要尽量缩短下载时间。这些格式的音频质量往往比较平庸。
A compressed lossy format, such as MP3, Vorbis, advanced audio codec (AAC), ATRAC, or Windows Media Audio), analyzes the input file to determine what information can be thrown away in order to meet a desired minimum bit rate. The goal is not only small file size but also minimum downloading time. The audio quality of these formats tends to be mediocre.
128 kbit/s 的 MP3 文件比同一首歌曲未压缩的 16 位 44.1 kHz 光盘音轨小约 11 倍。这些格式的编解码器通常允许用户选择数据压缩程度或文件编码时会产生损耗。“有损”一词指的是 MP3 在编码音频信号时,为了压缩声音文件的大小,会丢弃那些被认为不太清晰的声音部分。具体来说,MP3 会丢弃那些被相邻频率频谱掩盖的频率成分,以及那些被发生在其之前不到 5 毫秒的响亮事件在时间上掩盖的音调(Hacker 2000)。
A 128 kbit/s MP3 file is about eleven times smaller than an uncompressed 16-bit 44.1 kHz compact disc track of the same song. The codecs associated with these formats usually let users choose the degree of data reduction or loss when the file is encoded. The term lossy refers to the fact that as MP3 encodes audio signals, it discards parts of sound that are deemed less audible in order to compress the size of a sound file. Specifically, MP3 throws away frequency components that are spectrally masked by neighboring frequencies, as well as tones that are temporally masked by loud events that occur less than about 5 ms before them (Hacker 2000).
基于心理声学因素的音频压缩被称为感知编码(Gibson 等人,1998)。有损压缩的量由比特率参数控制,对于 MP3 来说,比特率通常在 32 到 320 kbit/s 之间,采样率则在 32 到 48 kHz 之间。比特率直接影响文件大小和音频质量;低比特率意味着音频分辨率较低。一种编码选项是可变比特率(VBR),它对简单段落(例如持续音)使用低比特率,对更复杂的段落(例如瞬态或噪声事件)使用高比特率。
Basing audio compression on psychoacoustic factors is called perceptual coding (Gibson, et al. 1998). The amount of lossy compression is controlled by the bit rate parameter, which for MP3 typically varies from 32 to 320 kbits/s and the sampling rate (from 32 to 48 kHz). The bit rate has a direct impact on both file size and audio quality; a low bit rate means lower audio resolution. One encoding option is variable bit rate (VBR) that uses a low bit rate for simple passages (e.g., sustained tones) and a high bit rate for more complex passages, such as transient or noisy events.
本质上,MP3 编码器将输入信号细分为 32 个频谱带,并测量每个频带随时间变化的能量。然后,通过应用感知编码技术,编码器可以大幅精简分析数据。例如,如果任何给定频带中的内容低于听觉阈值(根据频率而变化),编码器就会丢弃该频带。
In essence, an MP3 encoder subdivides the input signal into thirty-two spectral bands and measures the energy in each of these bands over time. Applying perceptual coding techniques, it then greatly reduces the analyzed data. For example, if the content in any given band falls below a threshold of audition (varying according to frequency), the encoder discards that band.
MP3 的播放本质上是由一个加法合成器完成的。MP3 文件专为大众传播而设计,通常使用廉价的扬声器和耳机播放,而这些设备对音质的要求并不高。有关 MP3 文件中各种音质问题的分析,请参阅 Corbett (2012)。
MP3 playback is performed by what is basically an additive synthesizer. Designed for mass distribution, MP3 files are often played back on cheap loudspeakers and earbuds where high audio quality is not a consideration. For an analysis of the myriad problems of sound quality in MP3 files, read Corbett (2012).
MP3 和 AAC(用于 YouTube 音频)等格式诞生于数据存储成本高昂、网络速度以每秒数百比特为单位的时代。如今,存储成本低廉,网络速度以每秒千兆比特为单位。因此,通过有损格式分发音乐的需求减少了,这让录音工程师们松了一口气(Faulkner 2011)。发烧友市场已基本转向高分辨率无损下载和流媒体格式,例如 FLAC、ALAC 和 DSD(Melchior 2019)。
Formats like MP3 and AAC (used in YouTube audio) were conceived when data storage was expensive and network speeds were measured in hundreds of bits per second. Today storage is cheap and network speeds are measured in gigabits per second. Thus the need for music distribution via lossy formats is diminished, much to the relief of recording engineers (Faulkner 2011). The audiophile market has largely moved to high-resolution lossless download and streaming formats such as FLAC, ALAC, and DSD (Melchior 2019).
Experiments at Bell Telephone Laboratories
Music III: The Modular Unit Generator Concept
本章介绍了计算机生成声音的丰富历史,并引出了模块化单元生成器的复杂概念。
This chapter ambles through the rich history of computer-generated sound, leading to the sophisticated concept of modular unit generators.
早期的数字计算机没有视觉显示器。为了更好地理解它们的运作,程序员们试图将计算过程声音化。例如,当计算机正常运行时,会产生可听见的无线电干扰。技术人员发现,在计算机附近放置一台收音机来监控其运行情况很有用。当声音停止时,表明计算机已停止运行。程序员们很快发现,无线电干扰的声音与程序的逻辑相关。具体来说,当程序以音频速率执行重复循环时,发出的声音是持续的音调。为了娱乐,他们编写了包含不同长度循环的程序,这些循环与流行歌曲的旋律相对应。
Early digital computers did not have visual displays. In order to better understand their operation, programmers sought to sonify computational processes. For example, when a computer was properly operating, it produced audible radio interference. Technicians found it useful to put a radio near the computer to monitor its operation. When the sound stopped, it indicated that the computer had halted. Programmers soon figured out that the sound of the radio interference correlated with the logic of the program. Specifically, when a program executed a repeating loop at an audio rate, the sound was a sustained pitch. For amusement, they wrote programs containing loops of varying lengths that corresponded to the melodies of popular songs.
有些计算机配有扬声器,可以用作输出设备,用于发出程序中发生特定事件(例如程序终止)的信号。在这些机器上,串行输出总线上的原始 1 位脉冲可以发送到扬声器。周期性脉冲循环,中间穿插着延迟,会产生一种被称为“嘟囔”或“咕囔”的音调。
Some computers had a loudspeaker that could be used as an output device to signal that a particular event had occurred in a program, such as program termination. On these machines, raw 1-bit pulses on the serial output bus could be sent to a speaker. A periodic loop of pulses interspersed with delays produced a pitched tone that was called a “blurt” or “hoot.”
1949年,弗朗西斯·E·“贝蒂”·霍尔伯顿(Frances E. “Betty” Holberton)编写了BINAC计算机程序,使其能够为费城埃克特-莫奇利计算机公司的研发团队播放歌曲《因为他是个快乐的好伙伴》(Irrlichtproject 2015)。1951年,澳大利亚CSIRAC和英国曼彻斯特Mark II计算机都通过这种方法播放了流行乐曲(Doornbusch 2005;Link 2007)。这些努力从未被用作正式研究,也未发表任何科学论文。
In 1949, Frances E. “Betty” Holberton programmed the BINAC computer to play “For He’s a Jolly Good Fellow” to the team who built the machine at the Eckert-Mauchly Computer Corporation in Philadelphia (Irrlichtproject 2015). In 1951, both the Australian CSIRAC and the British Manchester Mark II played popular tunes via this method (Doornbusch 2005; Link 2007). These efforts were never intended as formal research, and no scientific papers were published.
当时,用于声音样本的数模转换器 (DAC) 技术尚不存在。这意味着不可能进行广义的波形合成。
The technology of digital-to-analog converters (DACs) for sound samples did not exist at this time. This meant that there was no possibility for generalized waveform synthesis.
控制计算机生成波形各个方面最常用的方法是根据贝尔电话实验室通信研究员哈罗德·奈奎斯特(Harold Nyquist)于1928年提出的采样理论进行合成。首次利用计算机合成声音样本的实验始于1957年,由默里山贝尔电话实验室的研究人员进行。新泽西州(David、Mathews 和 McDonald 1958 年;Roads 1980 年;Wood 1991 年)。
The most general way to control all aspects of a computer-generated waveform is to synthesize it according to the 1928 theory of sampling devised by Harold Nyquist, a communications researcher at Bell Telephone Laboratories. The first experiments in the synthesis of sound samples by computer began in 1957 by researchers at Bell Telephone Laboratories in Murray Hill, New Jersey (David, Mathews, and McDonald 1958; Roads 1980; Wood 1991).
马克斯·V·马修斯(图5.1 )拥有麻省理工学院电气工程博士学位,曾在贝尔实验室工作,师从JR·皮尔斯。马修斯着手利用计算机创作音乐。他编写了一个程序,让计算机生成一系列二进制数字,代表音乐声波(即样本)的连续振幅(Mathews and Guttman 1959)。在实验中,马修斯和他的同事证明了计算机可以根据任何音高或波形合成声音,包括随时间变化的频率、振幅包络和复音。
With a doctorate in electrical engineering from MIT, Max V. Mathews (figure 5.1) worked at Bell Labs under the direction of J. R. Pierce. Mathews set out to use a computer to produce musical tones. He wrote a program that would make a computer generate a sequence of binary numbers representing successive amplitudes of a musical sound wave (i.e., samples) (Mathews and Guttman 1959). In their experiments, Mathews and his colleagues proved that a computer could synthesize sounds according to any pitch scale or waveform, including time-varying frequency and amplitude envelopes and polyphony.
图 5.1 Max V. Mathews,1981 年。
Figure 5.1 Max V. Mathews, 1981.
他们的第一个程序是为一台巨型 IBM 704 计算机编写的(图 5.2)。
Their first programs were written for a giant IBM 704 computer (figure 5.2).
图 5.2 IBM 704 计算机,1957 年。
Figure 5.2 IBM 704 computer, 1957.
当时计算机非常稀缺,计算工作不得不在位于曼哈顿麦迪逊大道和57街的IBM全球总部进行。704计算机的使用费按1957年的美元计算为每小时600美元(Johnstone 1994)。该计算机的逻辑电路采用真空管制成(图5.3)。704在当时是一台功能强大的机器,拥有36位字长和内置浮点处理器,可进行快速数值计算。运算。它可以装载高达 32 千字的磁芯存储器,每秒可执行高达 4,000 次乘法运算。声音合成计算耗时数小时。样本被写入数字磁带。12 位真空管 DAC 将样本转换为声音。这款转换器由 Bernard Gordon 设计,是当时世界上唯一能够产生声音的转换器(Roads 1980)。
Computers were so rare at that time that the computation had to be carried out at IBM World Headquarters on Madison Avenue and 57th Street in Manhattan. Use of the 704 computer was billed at $600 per hour in 1957 dollars (Johnstone 1994). The logic circuits of the computer were made using vacuum tubes (figure 5.3) The 704 was a powerful machine for its day, with a 36-bit word length and a built-in floating-point processor for fast numerical operations. It could be loaded with up to 32 kwords of magnetic core memory and execute up to 4,000 multiplications per second. Sound synthesis calculations took hours. The samples were written to a digital magnetic tape. A 12-bit vacuum tube DAC transformed the samples into sound. This converter, designed by Bernard Gordon, was at that time the only one in the world capable of sound production (Roads 1980).
图 5.3 IBM 704 计算机的真空管逻辑模块。
Figure 5.3 Vacuum tube logic module for the IBM 704 computer.
马修斯开发的“音乐 I”程序生成单一波形:等边三角形。用户只能通过音高和时值来指定音符(Roads 1980)。知觉心理学研究员纽曼·格特曼(Newman Guttman)用“音乐 I”创作了一首单音练习曲,名为《银阶》(In a Silver Scale),创作于1957年5月17日(Guttman 1980)。这是第一首通过数模转换合成的乐曲。格特曼意识到计算机具有精确生成任何频率的潜力,于是将这首乐曲用作实验,对比了两种微音阶:银(1957)描述的等拍半音阶和纯律。
The Music I program developed by Mathews generated a single waveform: an equilateral triangle. A user could specify notes only in terms of pitch and duration (Roads 1980). A perceptual psychology researcher named Newman Guttman made one composition with Music I, a monophonic etude called In a Silver Scale written on May 17, 1957 (Guttman 1980). This was the first composition synthesized by the process of digital-to-analog conversion. Recognizing the potential of the computer to generate any frequency precisely, Guttman used the piece as an experiment to contrast two microtonal scales, an equal-beating chromatic scale described by Silver (1957) and just intonation.
马修斯于1958年完成了Music II。它是用汇编语言为IBM 7090计算机编写的,这是一款基于IBM 704改进型计算机。7090的运行速度比老款机器快数倍,因此可以实现更强大的合成算法。Music II提供四种独立的声音,并可在内存中存储十六种波形供选择。贝尔电话实验室的多位研究人员都曾使用过Music II,其中包括马克斯·马修斯、约翰·皮尔斯和纽曼·格特曼。
Mathews completed Music II in 1958. It was written in assembly language for the IBM 7090 computer, an improved computer along the lines of the IBM 704. The 7090 ran several times faster than the older machines. It was thus possible to implement more ambitious synthesis algorithms. Four independent voices of sound were available, with a choice of sixteen waveforms stored in memory. Music II was used by several researchers at Bell Telephone Laboratories, including Max Mathews, John Pierce, and Newman Guttman.
1958年,纽约市举办了一场新计算机音乐音乐会,随后由约翰·凯奇主持了一场讨论会。同年晚些时候,格特曼在瑞士格拉韦萨诺赫尔曼·谢尔兴的别墅演奏了他用计算机合成的作品《音高变奏曲》,作曲家伊安尼斯·泽纳基斯当时也在场(Guttman 1980)。(我们在第24、25和50章中提到了泽纳基斯。)
A concert of the new computer music was organized in 1958 in New York City, followed by a discussion panel moderated by John Cage. Later that year Guttman played his computer-synthesized composition Pitch Variations at Hermann Scherchen’s villa in Gravesano, Switzerland, where composer Iannis Xenakis was in the audience (Guttman 1980). (We encounter Xenakis in chapters 24, 25, and 50.)
声音合成软件设计中最重要的发展是模块化单元生成器(UG) 的概念。UG 是信号处理模块,例如振荡器、滤波器和放大器,可以互连到形成用于生成和处理声音信号的合成乐器或音色。(在后续章节中,我们将更详细地讨论UG。)
The most important development in the design of sound synthesis software was the concept of modular unit generators (UGs). UGs are signal processing modules like oscillators, filters, and amplifiers that can be interconnected to form synthesis instruments or patches that generate and process sound signals. (In subsequent chapters we discuss UGs in more detail.)
第一个运用模块化单元生成器概念的合成语言是 Music III,由 Mathews 和他的同事 Joan E. Miller 于 1960 年开发。Music III 允许用户基于原始单元 (UG) 设计自己的合成网络。通过将声音信号传入一系列这样的单元生成器,可以相对轻松地实现各种合成算法。Music III 的灵活设计也支持多声部作曲(复音),尽管计算时间会相应增加。
The first synthesis language to make use of the modular unit generator concept was Music III, programmed by Mathews and his colleague Joan E. Miller in 1960. Music III let users design their own synthesis networks out of UGs. By passing the sound signal through a series of such unit generators, a large variety of synthesis algorithms could be implemented relatively easily. The flexible design of Music III also enabled multivoice composition (polyphony), albeit with a corresponding increase in computation time.
贝尔实验室于 1961 年出版了具有历史意义的唱片《数学之乐》(图 5.4)。唱片收录了皮尔斯、马修斯、格特曼和戴维·列文利用计算机生成的声音,以及《伊利亚克组曲》的选段。 为弦乐四重奏而作,是莱贾伦·希勒(Lejaren Hiller)的算法作曲(Hiller and Isaacson 1959)。第50章将详细介绍希勒和算法作曲。
Bell Labs published the historic recording Music From Mathematics in 1961 (figure 5.4). It featured studies in computer-generated sound by Pierce, Mathews, Guttman, and David Lewin, along with an excerpt of the Illiac Suite for String Quartet, an algorithmic composition by Lejaren Hiller (Hiller and Isaacson 1959). Chapter 50 has more on Hiller and algorithmic composition.
图 5.4 贝尔电话实验室于 1961 年出版的《数学音乐》黑胶唱片的作者副本封面
Figure 5.4 Cover of the author’s copy of Music from Mathematics vinyl record published by Bell Telephone Laboratories in 1961.
自 Music III 诞生以来,一系列基于单元生成器概念的软件合成系统已被众多研究人员开发。Music IV 是对 Music III 的重新编码,使用了贝尔实验室开发的一种名为 BEFAP 的新宏汇编语言(Tenney,1963 年,1969 年)。
Since the time of Music III, a family of software synthesis systems—all based on the unit generator concept—have been developed by various researchers. Music IV was a re-coding of Music III in a new macro assembly language developed at Bell Laboratories called BEFAP (Tenney 1963, 1969).
Music V(图 5.5)开发于 1968 年,是 Max Mathews 在软件合成领域(Mathews 1969)的巅峰之作。Music V 几乎完全采用 FORTRAN IV(当时的标准计算机语言)编写,并于 20 世纪 70 年代初出口到世界各地的大学和实验室。对于包括本书作者在内的许多音乐家来说,它打开了通往数字声音合成艺术的大门。
Music V (figure 5.5), developed in 1968, was the culmination of Max Mathews’s efforts in software synthesis (Mathews 1969). Written almost exclusively in FORTRAN IV—a standard computer language at the time—Music V was exported to universities and laboratories around the world in the early 1970s. For many musicians, including the author of this book, it opened a door to the art of digital sound synthesis.
图 5.5 Music V 合成音色。第 1-7 行定义了乐器,其中包含一个包络发生器。包络 F1 在第 8 行的 GEN 语句中定义。一个低频正弦波振荡器调制另一个振荡器上的颤音(Mathews 1969)。
Figure 5.5 Music V synthesis patch. Lines 1–7 define the instrument, which features an envelope generator. The envelope F1 is defined in the GEN statement in line 8. A low-frequency sine wave oscillator modulates vibrato on another oscillator (Mathews 1969).
以 Music IV 或 Music V 为蓝本,其他人开发了诸如 Music 4BF、Music 360、Music 7、Music 11、Csound、MUS10、Cmusic、Common Lisp Music、SuperCollider、ChucK、Synthesis ToolKit、Nyquist 和 Max 等合成程序。这些程序通常被统称为“ Music N 语言”(参见第 48 章)。
Taking Music IV or Music V as a model, others have developed synthesis programs such as Music 4BF, Music 360, Music 7, Music 11, Csound, MUS10, Cmusic, Common Lisp Music, SuperCollider, ChucK, Synthesis ToolKit, Nyquist, and Max. As a general category these programs are often referred to under the rubric Music N languages (see chapter 48).
已故的马克斯·V·马修斯(Max V. Mathews,1926-2011)是模块化单元发生器合成范式的发明者,该范式用于广义波形合成,堪称计算机生成声音之父。基于单元发生器模块化图的合成至今仍是灵活且实验性合成的标准。Music N语言为任何愿意学习编程的人提供了这种能力。
As the inventor of the modular unit generator synthesis paradigm for generalized waveform synthesis, the late Max V. Mathews (1926–2011) can rightly be called the father of computer-generated sound. Synthesis based on modular graphs of unit generators remains to this day the standard for flexible and experimental synthesis. Music N languages deliver this capability to anyone willing to learn to program.
Algorithm for a Digital Oscillator
Wavetable Lookup Noise and Interpolating Oscillators
Alternatives to Wavetable Lookup
数字合成器会生成一系列数字,代表任意音频波形的样本。我们可以通过将样本发送到数模转换器 (DAC) 来听到这些合成声音,DAC 会将这些数字转换为连续变化的电压,然后将其放大并发送到扬声器。
Digital synthesis generates a stream of numbers representing the samples of an arbitrary audio waveform. We can hear these synthetic sounds by sending the samples through a digital-to-analog converter (DAC), which converts the numbers to a continuously varying voltage that can be amplified and sent to a loudspeaker.
一种灵活的波形合成方法是扫描内存中预存的波表。这个过程称为波表查找合成。波表查找合成是数字振荡器(一种基本的声音发生器)的核心操作。
A flexible method of waveform synthesis is to scan a prestored wavetable in memory. This process is called wavetable lookup synthesis. Wavetable lookup synthesis is the core operation of a digital oscillator—a fundamental sound generator.
现在让我们来看一下查表的过程。假设第一个样本的值由波表中的第一个数字给出(如图 6.1中索引位置 0 所示)。对于这个简单的合成器要生成的每个新样本,都从波表中取出下一个样本。在波表的末尾,只需回到开头并重新开始读取样本即可。这个过程也称为固定波形合成,因为波形在声音事件的过程中不会发生变化。
Let us now walk through the process of table lookup. Suppose that the value of the first sample is given by the first number in the wavetable (shown in figure 6.1 at index location 0). For each new sample to be produced by this simple synthesizer, take the next sample from the wavetable. At the end of the wavetable, simply go back to the beginning and start reading out the samples again. The process is also called fixed-waveform synthesis because the waveform does not change over the course of a sound event.
图 6.1 波表查找合成的图形描述。下半部分中的列表 0–24 包含编号位置或表格索引值。每个索引点的音频样本值存储在内存中。样本在上半部分以矩形框出正弦波的轮廓来表示。例如,Wavetable[0] = 0,Wavetable[6] = 1。为了合成正弦波,计算机查找存储在连续索引位置的样本值,并将它们发送到 DAC,如此反复循环遍历整个表格。
Figure 6.1 Graphical depiction of wavetable lookup synthesis. The list 0–24 in the lower portion contains numbered locations or table index values. An audio sample value is stored in memory for each index point. The samples are depicted as the rectangles outlining a sine wave in the top portion. For example, Wavetable[0] = 0, and Wavetable[6] = 1. To synthesize the sine wave, the computer looks up the sample values stored in successive index locations and sends them to a DAC, looping through the table repetitively.
例如,假设表包含N = 1,000 个条目,每个条目都是一个 16 位数字。这些条目的索引从 0 到 999。我们称之为表中当前位置的phase_index值,参考波形的相位。为了读取该表,振荡器从表中的第一个条目(phase_index = 0)开始,以增量移动到表的末尾(phase_index = 999)。此时,相位索引绕过终点,回到波表的开头,并重新开始(图 6.2)。
For example, let us assume that the table contains N = 1,000 entries, each of which is a 16-bit number. The entries are indexed from 0 to 999. We call the current location in the table the phase_index value, with reference to the phase of the waveform. To read through the table the oscillator starts at the first entry in the table (phase_index = 0) and moves by an increment to the end of the table (phase_index = 999). At this point the phase index wraps around the ending point to the beginning of the wavetable and starts again (figure 6.2).
图 6.2 相位增量或相量(斜坡函数)从 0 变为N两次,从而产生两个正弦波周期。
Figure 6.2 The phase increment or phasor (a ramp function) goes from 0 to N two times, creating two cycles of the sine wave.
查表合成产生的声音频率是多少?这取决于波表的长度和采样频率。逻辑上,显而易见的是,如果在一秒钟内读取整个波表(无论其长度如何),结果就是 1 Hz 的波;每秒读取 100 次,就会产生 100 Hz 的音调,依此类推。
What is the frequency of the sound produced by table-lookup synthesis? It depends on the length of the wavetable and the sampling frequency. Logically, it should be obvious that if one reads through the entire wavetable (no matter its length) in one second, the result is a wave at 1 Hz; reading through it 100 times a second makes a tone at 100 Hz, and so on.
现在让我们更具体地讨论波表长度和采样频率。如果采样频率为每秒 50,000 次采样,表中有 1,000 个数字,则结果为 50,000 / 1,000:50 Hz 的低音。同样,如果采样频率为 100,000 Hz,表中有 1,000 个条目,则输出频率为 100 Hz,因为 100,000 / 1,000 = 100。
Let us now be more specific about wavetable length and sampling frequency. If the sampling frequency is 50,000 samples per second and there are 1,000 numbers in the table, the result is 50,000 / 1,000: a low tone at 50 Hz. Likewise, if the sampling frequency is 100,000 Hz and the table contains 1,000 entries, then the output frequency is 100 Hz, because 100,000 / 1,000 = 100.
如何改变输出信号的频率?正如我们刚才所见,一个简单的方法是改变采样频率。但这种策略存在问题,尤其是在处理或混合不同采样率的信号时。更好的解决方案是通过跳过样本(提高音高)或重复样本(降低音高)以不同的速率扫描波表。这些过程实际上是调整波表的大小以产生不同的频率。
How is it possible to change the frequency of the output signal? As we have just seen, one simple way is to change the sampling frequency. But this strategy has problems, particularly when one wants to process or mix signals with different sampling rates. A better solution is to scan the wavetable at different rates by skipping samples (to shift pitch up) or repeating samples (to shift pitch down). These processes, in effect, resize the wavetable in order to generate different frequencies.
例如,如果我们只取偶数个样本,那么遍历表格的速度会加倍。这会将输出信号的音高提高一个八度。如果我们跳过两个样本,音高会进一步提高(准确地说,提高一个八度加五度)。为了降低音高,我们需要重复采样。例如,为了将音高降低一个八度,我们需要播放每个样本两次。
For example, if we take only the even-numbered samples, then we go through the table twice as fast. This raises the pitch of the output signal by an octave. If we skip two samples, then the pitch is raised further (by an octave and a fifth, to be exact). To shift the pitch down we repeat samples. For example, to shift the pitch down an octave, we play each sample twice.
在查表算法中,相位增量决定了要跳过或重复的样本数。该增量会被添加到当前相位位置,以便找到下一个读取样本值的位置。在最简单的例子中,如果我们读取表中的每一个样本,则增量为 1。如果我们只读取表中奇数或偶数的样本,则增量为 2。
In the table-lookup algorithm, the phase increment determines the number of samples to be skipped or repeated. The increment is added to the current phase location in order to find the next location for reading the value of the sample. In the simplest example, where we read every sample from the table, the increment is 1. If we read only the odd- or even-numbered samples in the table, then the increment is 2.
我们可以说,振荡器对波表进行重采样是为了产生不同的频率。也就是说,它会跳过或重复表中的值,增量是波表中当前相位位置的增量。因此,最基本的振荡器算法可以解释为以下两步程序:
We could say that the oscillator resamples the wavetable in order to generate different frequencies. That is, it skips or repeats values in the table by an increment added to the current phase location in the wavetable. Thus the most basic oscillator algorithm can be explained as the following two-step program:
1.相位索引 = mod L (前一个相位 + 增量)
1. phase_index = modL (previous_phase + increment)
2.输出 = 振幅 × 波表[相位索引]
2. output = amplitude × wavetable[phase_index]
算法的第一步包含一个加法和一个模运算(记为 mod L)。模运算将和除以表长度L,只保留余数,余数始终小于或等于L。第二步包含一个查表和一个乘法。这部分计算量相对较小,但它假设波表已经填充了波形值。
Step 1 of the algorithm contains an addition and a modulo operation (denoted modL). The modulo operation divides the sum by the table length L and keeps only the remainder, which is always less than or equal to L. Step 2 contains a table lookup and a multiplication. This is relatively little computation, but it assumes that the wavetables are already filled with waveform values.
如果表的长度和采样频率是固定的(通常情况下),那么振荡器发出的声音的频率取决于增量的值。给定频率和增量之间的关系由以下公式给出,该公式是查表合成中最重要的公式:
If the table length and the sampling frequency are fixed, as is usually the case, then the frequency of the sound emitted by the oscillator depends on the value of the increment. The relationship between a given frequency and an increment is given by the following equation, which is the most important equation in table lookup synthesis:
例如,如果表长度L为 1,000,采样频率为 40,000,振荡器的指定频率为 2,000 Hz,则增量为 50。
For example, if tablelength L is 1,000, the sampling frequency is 40,000, and the specified frequency of the oscillator is 2,000 Hz, then the increment is 50.
这意味着频率的下列方程:
This implies the following equation for frequency:
关于数字振荡器的理论就讲到这里。现在我们来谈谈计算的现实。
So much for the theory of digital oscillators. Now we confront the computational realities.
上例中的所有变量都是 1,000 的倍数,因此相位索引增量的值最终会得到一个整数值。然而,对于公式 1 中表长度、频率和采样频率的大多数值,最终的增量值并非整数,而是一个小数点后带有小数部分的实数。然而,我们在波表中查找值的方式是通过其索引来定位,而索引是一个整数。因此,我们需要以某种方式从实值增量中导出一个整数值。
All the variables in the previous example were multiples of 1,000, which led to a neat integer result for the value of the phase index increment. However, for most values of the table length, frequency, and sampling frequency in equation 1, the resulting increment is not an integer but rather a real number with a fractional part after the decimal point. However, the way we look up a value in the wavetable is to locate it by its index, which is an integer. Thus we need to somehow derive an integer value from the real-valued increment.
可以将实数值截断为表索引的整数值。这意味着删除小数点右边的数字部分,例如,6.99 被截断后变为 6。
The real value can be truncated to yield an integer value for the table index. This means to delete the part of the number to the right of the decimal point, so, for example, 6.99 becomes 6 when it is truncated.
假设我们使用 1.125 的增量。表 6.1比较了计算出的增量和截断后的增量。截断造成的不精确意味着我们得到的波形值接近实际需要的波形值,但并不完全相同。结果会引入少量的波形失真,称为查表噪声(Moore 1977; Snell 1977b)。有多种方法可以减少这种噪声。一种方法是使用更大的波表,因为细粒度的表可以减少查找误差。另一种方法是方法是将增量值向上或向下舍入到最接近的整数,而不是简单地截断它;在本例中,增量 6.99 变成了 7,比 6 更精确。但最佳性能是通过插值振荡器实现的。从计算角度来看,这种方法成本更高,但它能产生非常干净的信号。
Suppose that we use an increment of 1.125. Table 6.1 compares the calculated versus the truncated increments. The imprecision caused by the truncation means that we obtain a waveform value near, but not precisely the same as, the one that we actually need. As a result, small amounts of waveform distortion are introduced, called table lookup noise (Moore 1977; Snell 1977b). Various remedies can reduce this noise. A larger wavetable is one prescription, because a fine-grain table reduces lookup error. Another way is to round the value of increment up or down to the nearest integer instead of simply truncating it; in this case, an increment of 6.99 becomes 7, which is more accurate than 6. But the best performance is achieved by an interpolating oscillator. This is more costly from a computational standpoint, but it generates very clean signals.
|
表 6.1 阶段索引,表格查找表 Table 6.1 Phase index, table lookup list |
||
|---|---|---|
|
已计算 Calculated |
截断 Truncated |
|
|
1.000 1.000 |
1 1 |
|
|
2.125 2.125 |
2 2 |
|
|
3.250 3.250 |
3 3 |
|
|
4.375 4.375 |
4 4 |
|
|
5.500 5.500 |
5 5 |
|
|
6.625 6.625 |
6 6 |
|
|
7.750 7.750 |
7 7 |
|
|
8.875 8.875 |
8 8 |
|
|
10.000 10.000 |
10 10 |
|
|
11.125 11.125 |
11 11 |
|
|
12.250 12.250 |
12 12 |
|
|
13.375 13.375 |
十三 13 |
|
|
14.500 14.500 |
14 14 |
|
|
15.625 15.625 |
15 15 |
|
|
16.750 16.750 |
16 16 |
|
|
17.875 17.875 |
17 17 |
|
|
19.000 19.000 |
19 19 |
|
插值振荡器会计算,如果能够以增量指定的精确相位引用波表,波表的值将会是多少。换句话说,它会在波表中的各个条目之间进行插值,以找到与指定相位索引增量完全对应的条目(图 6.3)。
An interpolating oscillator calculates what the value of the wavetable would have been, if it were possible to reference the wavetable at the exact phase specified by the increment. In other words, it interpolates between the entries in the wavetable to find the one that exactly corresponds to the specified phase index increment (figure 6.3).
图 6.3 插值振荡器的动作。该图显示了波表中两个x点,分别位于位置 27 和 28。振荡器相位增量指示该值应为 27.5。插值振荡器使用线性插值算法生成介于 27 和 28 之间的y值。
Figure 6.3 Action of an interpolating oscillator. The graph shows two x points in a wavetable, at positions 27 and 28. The oscillator phase increment indicates that the value should be 27.5. The interpolating oscillator generates a y value between 27 and 28 using a linear interpolation algorithm.
使用插值振荡器,较小的波表可以产生与较大的非插值振荡器相同的音频质量。试想一下,对于一个插值振荡器使用的1024个条目的波表,正弦波的信噪比达到了极好的109 dB(最坏情况),而使用相同大小波表的非插值振荡器的信噪比仅为糟糕的48 dB(Moore 1977)。这些数据适用于线性插值的情况;使用更复杂的插值方案甚至可以获得更好的结果(Chamberlin 1985;Crochiere 和 Rabiner 1983;Moore 1977;Snell 1977a;Dannenberg 1998)。
With interpolating oscillators, smaller wavetables can yield the same audio quality as a larger noninterpolating oscillator. Consider that for a 1024-entry wavetable used by an interpolating oscillator, the signal-to-noise ratio for a sine wave is an excellent 109 dB (worst case), as compared with the abysmal 48 dB for a noninterpolating oscillator using the same size wavetable (Moore 1977). These figures pertain to the case of linear interpolation; even better results are possible with more elaborate interpolation schemes (Chamberlin 1985; Crochiere and Rabiner 1983; Moore 1977; Snell 1977a; Dannenberg 1998).
在某些情况下,数值运算比内存表查找快得多。内存通常比微处理器指令慢。现代微处理器拥有用于正弦和余弦生成、指数、平方根和点积的原生硬件指令。因此,正如 James McCartney (1997) 指出的那样,正弦波、指数正弦波、共振峰振荡器和混沌振荡器等波形可以通过直接求其方程式来更高效地生成。SmithCook (1992) 描述了一种基于数字波导的高效正弦振荡器(另见 Smith 2010)。Laroche (1998) 展示了如何使用谐振滤波器合成时变正弦波,这是一种有效的加法合成策略。
In certain cases, numerical operations are much faster than memory table lookups. Memories tend to be slower than microprocessor instructions. Modern microprocessors have native hardware instructions for sine and cosine generation, exponentials, square roots, and dot products. Thus, as pointed out by James McCartney (1997), waveforms like sine waves, exponentiated sine waves, formant oscillators, and chaotic oscillators can be generated much more efficiently by direct evaluation of their equation. Smith and Cook (1992) described a highly efficient sinusoidal oscillator based on digital waveguides (see also Smith 2010). Laroche (1998) showed how using resonant filters to synthesize time-varying sinusoids could be an efficient strategy for additive synthesis.
我们对固定波形查表合成的介绍到此结束。下一章将展示如何利用包络随时间变化来改变合成的各个方面。
This concludes our introduction to fixed-waveform table-lookup synthesis. The next chapter shows how aspects of synthesis can be varied over time with envelopes.
Envelopes, Unit Generators, and Patches
Graphic Notation for Synthesis Instruments
上一章展示了如何生成固定频率的正弦波。由于正弦波的最大值不随时间变化,因此信号具有恒定的响度。这对于音乐创作来说并不是很有用,因为它只能控制音高和时长,而无法控制其他声音参数。即使振荡器从其他波表读取数据,它们也会无限重复。获得更有趣声音的关键是时变波形,这可以通过在声音事件的持续时间内改变一个或多个合成参数来实现。
The previous chapter showed how to generate a sine wave at a fixed frequency. Because the maximum value of the sine wave does not change in time, the signal has a constant loudness. This is not terribly useful for musical purposes, because it allows control over only pitch and duration and no control over other sound parameters. Even if the oscillator reads from other wavetables, they repeat ad infinitum. The key to more interesting sounds is time-varying waveforms, achieved by changing one or more synthesis parameters over the duration of a sound event.
要创建时变波形,我们需要一个能够由包络(时间函数)控制的合成器。例如,如果声音的振幅随其持续时间变化,则振幅遵循的曲线称为振幅包络。设计合成器的一种通用方法是将其想象成一个模块化系统,包含多个专门的信号处理单元,它们共同产生时变声音。
To create a time-varying waveform, we need a synthesis instrument that can be controlled by envelopes—functions of time. For example, if the amplitude of the sound changes over its duration, the curve that the amplitude follows is called the amplitude envelope. A general way of designing a synthesis instrument is to imagine it as a modular system, containing a number of specialized signal-processing units that together create a time-varying sound.
正如第五章所指出的,单元发生器(UG)是数字合成中的一个基本概念。UG 可以是信号发生器,也可以是信号调制器。信号发生器(例如振荡器)合成音乐波形和包络等信号。信号调制器(例如滤波器)将信号作为输入,并以某种方式对该输入信号进行变换。
As chapter 5 pointed out, the unit generator (UG) is a fundamental concept in digital synthesis. A UG is either a signal generator or a signal modifier. A signal generator (such as an oscillator) synthesizes signals such as musical waveforms and envelopes. A signal modifier, such as a filter, takes a signal as its input and transforms that input signal in some way.
要构建一个声音合成器,需要将UG连接成一个patch(补丁)。 “patch”一词源于模块化合成器,其中的声音模块通过跳线连接。当然,当连接通过软件完成时,无需连接任何物理线路或电缆。当一个UG在其输出端产生一个数字时,该数字可以作为另一个UG的输入。
To construct an instrument for sound synthesis, one interconnects UGs into a patch. The term patch derives from modular synthesizers in which sound modules are connected via patch cords. Of course, when the connections are done in software, no physical wires or cables are connected. When a UG produces a number at its output, that number can become the input to another UG.
现在我们来介绍一下数字声音合成出版物中常用的、用来表示音色的图形符号。这种符号是为了解释最早的模块化数字声音合成语言(例如 Music 4BF(Howe 1975)和 Music V(Mathews 1969))的操作而发明的,至今仍然有用。
Now we introduce the graphic notation often used in publications on digital sound synthesis to illustrate patches. This notation was invented to explain the operation of the first modular languages for digital sound synthesis, such as Music 4BF (Howe 1975) and Music V (Mathews 1969) and is still useful.
每个单元发生器的符号都有其独特的形状。图 7.1展示了一个称为osc 的表查找振荡器的图形符号,它是一个基本的信号发生器。它接受三个输入:幅度、频率和存储在波表f 1中的波形。它产生的输出信号是f 1 以规定的频率和幅度重复的。
The symbol for each unit generator has a unique shape. Figure 7.1 shows the graphic notation for a table-lookup oscillator called osc, a basic signal generator. It accepts three inputs: amplitude, frequency, and a waveform stored in wavetable f 1. It produces an output signal that is f 1 repeated at the stipulated frequency and amplitude.
图 7.1 具有声音波形输入f 1 且幅度和频率参数固定的振荡器。
Figure 7.1 Oscillator with sound waveform input f 1 and fixed parameters for amplitude and frequency.
如果我们为振荡器的振幅输入提供一个常数(例如 1.0),则输出波形的整体振幅在每个事件的持续时间内都是恒定的。相比之下,大多数有趣的声音都有一个随时间变化的振幅包络。通常,一个音符的振幅从 0 开始,逐渐上升到某个最大值(通常标准化为不大于 1.0),然后或多或少缓慢地下降到 0。(标准化的波已被缩放到标准范围内,例如振幅包络为 0 到 1,其他波为-1到+1。)包络的起始部分称为起音部分,包络的结束部分称为释放部分。
If we supply a constant number (for example, 1.0) to the amplitude input of an oscillator, then the overall amplitude of the output waveform is constant over the duration of each event. By contrast, most interesting sounds have an amplitude envelope that varies as a function of time. Typically, a note starts with an amplitude of 0, works its way up to some maximum value (usually normalized to be no greater than 1.0), and dies down again more or less slowly to 0. (A normalized wave has been scaled to fall within standard boundaries such as 0 to 1 for amplitude envelopes, or −1 to +1 for other waves.) The beginning part of the envelope is called the attack portion, and the end of the envelope is called the release.
许多合成器在以下四个阶段定义幅度包络:
Many synthesizers define amplitude envelopes in the following four stages:
这种四级包络的常用缩写是 ADSR(图 7.2)。ADSR 概念对于口头描述包络的整体形状很有用;例如,“使音头更锐利”。然而,对于指定音乐包络来说,四个阶段是有限制的。更灵活的包络编辑器允许音乐家描绘任意曲线。
The usual acronym for such a four-stage envelope is ADSR (figure 7.2). The ADSR concept is useful for describing verbally the overall shape of an envelope; for example, “Make the attack sharper.” However, for specifying a musical envelope, four stages is limiting. More flexible envelope editors allow musicians to trace arbitrary curves.
图 7.2 启动、衰减、维持、释放 (ADSR) 幅度包络。
Figure 7.2 Attack, decay, sustain, release (ADSR) amplitude envelope.
图 7.1中的仪器可以很容易地通过将包络连接到振荡器的振幅输入端来产生随时间变化的振幅。现在,我们更接近于用音乐的方式控制振荡器了。如果我们设置包络的时长和曲线,那么包络就会控制每个音符的振幅。
The instrument of figure 7.1 can be easily adapted to generate a time-varying amplitude by hooking up an envelope to the amplitude input of the oscillator. We are now closer to controlling the oscillator in musical terms. If we set the duration and the curve of the envelope, then the envelope controls the amplitude of each note.
为乐曲中的每个事件设计一个包络非常繁琐。我们寻求的是一种简单的程序来生成一个能够根据不同事件的持续时间进行缩放的包络。图 7.3展示了一个包络生成器 env_gen。这个 UG env_gen 接收一个持续时间、一个峰值幅度和一个波表。它会读取指定持续时间内的波表f 1 ,并根据峰值幅度进行缩放。它可以适应任何持续时间的音调。在本例中,声音波形为f 2 。
Designing an envelope for every event in a composition is tedious. What we seek is a simple procedure for generating an envelope that can scale itself to the duration of diverse events. Figure 7.3 shows an envelope generator, env_gen. This UG env_gen takes in a duration, a peak amplitude, and a wavetable. It reads through the wavetable f 1 over the specified duration, scaling it by the peak amplitude. It is adaptable to tones of any duration. The sound waveform is f 2 in this case.
图 7.3 振幅包络为f 1 、声音波形为f 2 的振荡器。
Figure 7.3 Oscillator with amplitude envelope f 1 and sound waveform f 2.
读者可能已经猜到了,我们也可以将包络发生器连接到 osc 的频率输入端,以获得诸如颤音或滑音之类的音高变化。事实上,我们可以通过多种方式连接包络、振荡器和其他单元发生器,以产生不同的声音。这就是声音合成的模块化方法。
As the reader might guess, we could also attach an envelope generator to the frequency input of osc to obtain a pitch change such as vibrato or glissando. Indeed, we can interconnect envelopes, oscillators, and other unit generators in a wide variety of ways in order to make different sounds. This is the modular approach to sound synthesis.
人类天生对波动敏感,因此,时变包络是产生有趣声音的关键也就不足为奇了。包络是姿态的轮廓。作为人体运动的类比,它们为原本毫无生气的电子信号注入了活力。
Human beings are naturally sensitive to undulation, so it is not surprising that time-varying envelopes are the key to interesting sounds. Envelopes are profiles of gestures. As analogies of human body movements, they infuse life into otherwise lifeless electronic signals.
Types of Software Synthesizers
Graphical Instrument Patch Editors
Custom Apps Made Using General-Purpose Programming Languages
Real-Time and Non-Real-Time Synthesis
本章首先比较硬件合成和软件合成。我们简要回顾软件合成的历史,回顾各种类型的软件合成器,探讨实时和非实时合成,并介绍音频编程的资源。
This chapter begins by comparing hardware versus software synthesis. We briefly review the history of software synthesis, review various types of software synthesizers, look at real-time and non-real-time synthesis, and introduce resources for audio programming.
在数字合成的早期,计算机速度太慢,无法实现实时操作。我们所说的实时是什么意思?在这种情况下,实时意味着我们可以在一个采样周期内完成样本的所有计算。如果一个系统需要十秒钟来计算一秒钟的声音,那么它必须以非实时系统的方式运行。这意味着必须将声音输出写入文件,以便稍后试听。早期的计算机音乐系统无法在有声音的情况下进行交互式播放——从音乐的角度来看,这是一个巨大的缺陷。
In the early days of digital synthesis, computers were too slow for real-time operation. What do we mean by real time? In this context, real time means that we can complete all calculations for a sample within the duration of one sample period. If a system takes ten seconds to compute one second of sound, it must operate as a non-real-time system. This means that the sound output must be written to a file to be auditioned later. Early computer music systems could not be played interactively in the presence of sound—a huge liability from a musical standpoint.
因此,人们强烈希望设计专门的数字硬件来实时合成声音(Markowitz 1989;Alonso 1973;Alles and Di Giugno 1977;Snell 1977a、1977b;Samson 1980、1985;Asta 等人 1980;Wallraff 1979a;Loy 2013a、2013b)。早期的硬件合成器稀有且昂贵。一个突破性的产品是 Yamaha DX7(1983 年),这是一款基于专用调频(FM) 合成芯片(第 16 章)的量产键盘乐器。售价不到 2,000 美元,销量超过 200,000 台。20 世纪 80 年代的许多流行音乐都融入了 Yamaha FM 声音。
Thus there was a strong motivation to design specialized digital hardware to synthesize sound in real time (Markowitz 1989; Alonso 1973; Alles and Di Giugno 1977; Snell 1977a, 1977b; Samson 1980, 1985; Asta, et al. 1980; Wallraff 1979a; Loy 2013a, 2013b). Early hardware synthesizers were rare and expensive. A breakthrough product was the Yamaha DX7 (1983), a mass-produced keyboard instrument based on specialized chips for frequency modulation (FM) synthesis (chapter 16). Selling for less than $2,000, over 200,000 units were sold. Much pop music of the 1980s was infused with the Yamaha FM sound.
专用硬件效率高,但灵活性有限。例如,DX7 中的芯片专为 FM 而设计,无法实现其他合成方法。因此,最灵活的声音生成方法是在计算机上运行软件合成程序。在软件合成中,所有与样本流计算相关的计算都由计算机完成,无需额外的硬件。
Specialized hardware can be efficient but is limited in its flexibility. For example, the chips in the DX7 were designed for FM; they could not realize other synthesis methods. Thus, the most flexible approach to sound generation is a software synthesis program running on a computer. In software synthesis, all the calculations involved in computing a stream of samples are carried out by a computer without additional hardware.
软件合成的一个早期例子是历史悠久的Music V语言(Mathews 1969),如图8.1所示。为了创造声音,人们需要用Music V语言编写程序,然后执行该程序。这个非实时的过程可能需要数小时(Roads 2001a)。
An early example of software synthesis was the venerable Music V language (Mathews 1969), shown in figure 8.1. To create sound, one would write a program in the Music V language and then execute the program. This non-real-time process could take hours (Roads 2001a).
图 8.1 作者打印的 1974 年 Music V 源代码摘录。该代码是经典的 FORTRAN IV 语言,程序控制流由 GOTO 语句控制。前 16 行是 OUT 单元生成器的结尾。其余部分是 OSCIL 单元生成器的开头。适用于 GFortran 和 Linux 的更现代版本的 Music V 代码发表于 Boulanger 和 Lazzarini (2011)。GFortran 仍在维护中。
Figure 8.1 Excerpt of the author’s printout of the source code of Music V from 1974. The code is classic FORTRAN IV, with program control flow by GOTO statements. The first 16 lines are the end of the OUT unit generator. The rest is the beginning of the OSCIL unit generator. The code for a more modern version of Music V for GFortran and Linux is printed in Boulanger and Lazzarini (2011). GFortran continues to be maintained.
图 8.2是一个简单的 Music V 程序示例。图 8.2(a)显示了待合成音符的常规乐谱。图 8.2(b)显示了这是一个用于演奏乐谱的简单乐器的音色包。它仅由一个振荡器和一个输出盒组成。振荡器有两个输入:输出的振幅由音符语句中的第五个元素(指定为 P5)设置。要演奏的波形由函数 F2 决定,如图8.2c所示。
Figure 8.2 is a simple example of a Music V program. Figure 8.2(a) shows a conventional score of the notes to be synthesized. Figure 8.2(b) shows the patch for a simple instrument that will play the score. It consists solely of an oscillator and an output box. The oscillator has two inputs: the amplitude of the output is set by the fifth element in the note statement, designated P5. The waveform to be played is determined by the function F2, which is sketched in figure 8.2c.
图 8.2 Music V 中的简单管弦乐队和乐谱(Mathews 1969 年版)。为了简化解释,添加了行号 1-17。代码中的 1-4 行定义了 1 号乐器(一个振荡器和一个输出)。第 5 行生成函数 F2。第 6-16 行列出了定义乐谱的音符语句。每个音符语句都给出了起始时间、乐器编号、时值、振幅和音高(在本例中已编码)。第 17 行终止程序。
Figure 8.2 A simple orchestra and score in Music V (after Mathews 1969). Line numbers 1–17 have been added to simplify the explanation. Lines 1–4 in the code define instrument number 1 (an oscillator and an output). Line 5 generates the function F2. Lines 6–16 list note statements defining the score. Each note statement gives starting time, instrument number, duration, amplitude, and pitch (encoded in this case). Line 17 terminates the program.
到了 20 世纪 70 年代中期,音乐合成语言已可在小型计算机上运行。大多数短期实验的合成作业等待时间缩短至几分钟,而合成涉及混响的整首乐曲等高强度任务则需要数小时。
By the mid-1970s, music synthesis languages could be run on minicomputers. Waiting times for synthesis jobs were reduced to minutes for most short-duration experiments and hours for intensive tasks like synthesizing entire pieces involving reverberation.
到 20 世纪 80 年代末,我们开始看到第一批实时合成软件在家庭计算机上运行,例如 Csound(1990 年)、SuperCollider(1996 年)、Seer Reality(1997 年)和 Max/MSP(1997 年)。
By the late 1980s, we began to see the first real-time synthesis software such as Csound (1990), SuperCollider (1996), Seer Reality (1997), and Max/MSP (1997) running on home computers.
如今,市面上有众多实时软件合成器可供选择。例如,任何使用 Native Instruments REAKTOR 模块化系统的用户可以设计和发布合成器,仅此平台就有数千种合成器可用。
Today a multitude of real-time software synthesizers are available. To cite one example, any user of the Native Instruments REAKTOR modular system can design and publish a synthesizer, so that thousands of synthesizers are available for this platform alone.
当代软件综合可分为六大类:
Contemporary software synthesis can be divided into six general categories:
第 19 章介绍虚拟模块化合成器。第 47 章概述了图形乐器音色编辑器。第 48 章介绍文本合成语言。
Chapter 19 looks at virtual modular synthesizers. Chapter 47 surveys graphical instrument patch editors. Chapter 48 addresses textual synthesis languages.
封闭式应用程序旨在出色地完成一项任务。它们经过优化,仅能实现一种合成技术或几种变体。它们的图形界面体现了这种专业化。它们无法自由地重新编程以实现任何合成技术。图 8.3展示了 Native Instruments FM8,它模拟了 Yamaha DX7 合成器。即使是像 FM8 这样的封闭式应用程序,在音色设置方面也提供了一定的灵活性。它的“专家”页面允许用户设计自己的 FM 音色。
Closed apps are designed to do one thing very well. They are optimized to realize a single synthesis technique or only a few variations. Their graphical interface reflects this specialization. They cannot be freely reprogrammed to realize any synthesis technique. Figure 8.3 shows the Native Instruments FM8, which emulates a Yamaha DX7 synthesizer. Even a closed app like FM8 provides some flexibility in patching. Its Expert page lets users design their own FM patches.
图 8.3 Native Instruments FM8。该软件合成器针对频率调制合成进行了优化。
Figure 8.3 Native Instruments FM8. This software synthesizer is optimized for frequency modulation synthesis.
许多商业软件合成器既可以独立运行(即无需运行其他应用程序),也可以作为数字音频工作站(DAW) 或声音编辑器的插件扩展运行。插件实现的优势在于,多个插件副本或实例可以同时运行,并且可以在播放时由 DAW 进行录制。
Many commercial software synthesizers can run either in standalone mode (i.e., with no other apps running), or as a plug-in extension to a digital audio workstation (DAW) or sound editor. The plug-in implementation has the advantage that multiple copies or instances of the plug-ins can run simultaneously, and they can be recorded by the DAW as they play.
可修补应用程序提供了一组有限的模块,可以修补这些模块以实现不同的合成技术。
Patchable apps offer a limited set of modules with the possibility of patching the modules to realize different synthesis techniques.
一个可修补应用程序的例子是 Native Instruments Absynth,它采用半模块化设计,拥有 12 个模块插槽(图 8.4)。用户可以选择各种类型的振荡器、滤波器和调制器来填充这些空插槽。
An example of a patchable app is Native Instruments Absynth, which features a semimodular design with twelve module slots (figure 8.4). One can choose among various types of oscillators, filters, and modulators to fill in the open slots.
图 8.4 Absynth 补丁窗口。
Figure 8.4 Absynth patch window.
另一款是 Madrona Labs Aalto,这是一款自诩为西海岸风格的合成器,效仿已故的 Don Buchla。与 Buchla 系统类似,它将一个复杂的振荡器与一个低通门电路耦合在一起。它还配备了两个包络发生器、一个波导延迟器、一个滤波器和一个音序器。虽然它的模块集合是固定的,但它提供了二十多个用于模块互连的跳线点(图 8.5)。据开发者介绍,正是通过巧妙地利用这些互连点,才能创造出意想不到的声音(Jones 2020)。
Another is Madrona Labs Aalto, a self-styled West Coast synthesizer following the example of the late Don Buchla. Like a Buchla system, it couples a complex oscillator with a lowpass gate. It also features two envelope generators, a waveguide delay, a filter, and a sequencer. Although its collection of modules is fixed, it offers more than two dozen patch points for interconnecting the modules (figure 8.5). According to the developer, it is by playing with the interconnections that unexpected sounds emerge (Jones 2020).
图 8.5 Madrona Labs Aalto,一款可修补的软件合成器。
Figure 8.5 Madrona Labs Aalto, a patchable software synthesizer.
Unfiltered Audio 的多功能 LION 合成器(图 8.6)进一步提供了可扩展数量的虚拟调制源,可以通过修补来控制合成参数。
Unfiltered Audio’s versatile LION synthesizer (figure 8.6) goes further to offer an expandable number of virtual modulation sources that can be patched to control synthesis parameters.
图 8.6 Unfiltered Audio LION,一个可修补的软件合成器。
Figure 8.6 Unfiltered Audio LION, a patchable software synthesizer.
第一波虚拟模块化软件合成器是对老式模拟合成器的模拟。Arturia 与 Robert Moog 合作,于 2003 年推出了 Modular V 合成器。此后,Arp、EMS 和 Buchla 等公司也陆续推出了对老式合成器的模拟。图 8.7展示了 Arturia 的 EMS Synthi V 软件合成器的屏幕。
The first wave of virtual modular software synthesizers were simulations of vintage analog synths. Arturia worked with Robert Moog to introduce the Modular V synthesizer in 2003. Since that time, emulations of vintage synths by Arp, EMS, and Buchla have appeared. Figure 8.7 shows the screen of Arturia’s EMS Synthi V software synthesizer.
图 8.7 Arturia EMS Synthi V 软件合成器。
Figure 8.7 Arturia EMS Synthi V software synthesizer.
新的虚拟模块化合成器包括 Native Instruments REAKTOR Blocks、VCV Rack、SoftTube Modular 和 Cherry Audio 的 Voltage Modular(参见第 19 章)。
New virtual modular synthesizers include Native Instruments REAKTOR Blocks, VCV Rack, SoftTube Modular, and Cherry Audio’s Voltage Modular (see chapter 19).
图形乐器编辑器(例如 Max、PureData (PD) 或 AudioMulch)允许用户通过在屏幕上连接图标来构建合成音色(图 8.8)。这些图标代表单元生成器。我们所说的可音色应用与图形乐器编辑器的区别在于,在编辑器中,用户可以将任意数量的模块添加到音色中。此外,在像 Max 这样的系统中,有数百个模块可供选择。编辑完成后,可以将结果编译成一个独立的应用程序。
A graphical instrument editor such as Max, PureData (PD), or AudioMulch lets one build synthesis patches by interconnecting icons on a screen (figure 8.8). The icons represent unit generators. The difference between what we have called a patchable app and a graphical instrument editor is that in the editor one can add any number of modules into a patch. Moreover, in a system like Max, there are hundreds of modules from which to choose. Once the editing is done, the result can be compiled to make a self-contained app.
图 8.8 频率调制的最大补丁。
Figure 8.8 Max patch for frequency modulation.
使用专为合成而设计的语言,音乐家可以通过编写文本来指定声音,然后合成引擎会对其进行解释。这类语言的代表包括 Csound、Nyquist、ChucK、Faust 和 SuperCollider 等。图 8.9展示了一个用 SuperCollider 3 编写的简单 FM 合成器。
Using a language designed for synthesis, a musician can specify sounds by writing a text that is interpreted by a synthesis engine. This category is represented by languages such as Csound, Nyquist, ChucK, Faust, and SuperCollider. Figure 8.9 shows a simple FM synthesis instrument written in SuperCollider 3.
图 8.9 用于频率调制合成的 SuperCollider 代码。
Figure 8.9 SuperCollider code for frequency modulation synthesis.
使用像 SuperCollider 这样的编程语言,可以编写功能齐全的新应用程序。例如,PulsarGenerator(Roads 2001a, 2001b)就是用 SuperCollider 编译的。我们的颗粒化应用程序 EmissionControl 最初是用 SuperCollider 和 C 语言(Thall 2005)组合编写的。
Using a programming language like SuperCollider, one can compile new fully-functioning apps. For example, PulsarGenerator (Roads 2001a, 2001b) was compiled in SuperCollider. Our granulation app EmissionControl was originally coded in a combination of SuperCollider and C (Thall 2005).
为了实现极致的定制化和效率,可以使用 C ++等通用语言进行编程。市场上的大多数软件合成器市场上的音频应用程序均采用 C ++编写。代码库对于音频应用程序编程至关重要,因为它们为这些应用程序中常见的许多基本任务提供了解决方案。代码库有助于图形用户界面 (GUI) 设计(例如 ImGui 和 JUCE)、音频输入和输出(例如 PortAudio 和 RtAudio)、MIDI 控制(例如 RtMidi)、信号处理算法(例如 Pedal 和 STK)以及使用机器学习的音乐信息检索(例如 FluCoMa 项目)。我们的 EmissionControl2 颗粒合成应用程序采用 C ++编写,使用了 Allolib 代码库,该库旨在支持沉浸式 3D 音频和视觉效果。
For the ultimate in customization and efficiency, one can program in a general-purpose language like C++. Most of the software synthesizers on the market are coded in C++. Code libraries are vital to programming audio applications because they provide solutions to many basic tasks that are common to these applications. Code libraries facilitate graphical user interface (GUI) design (e.g., ImGui and JUCE), audio input and output (e.g., PortAudio and RtAudio), MIDI control (e.g., RtMidi), signal processing algorithms (e.g., Pedal and STK), and music information retrieval using machine learning (e.g., FluCoMa Project). Our EmissionControl2 granular synthesis app was coded in C++ using the Allolib code library, which was designed to support immersive 3D audio and visuals.
几十年来,摩尔定律(1965 年)预测晶体管密度每年将翻一番,并对时钟速度产生后续影响。到 2012 年,随着微处理器的时钟速度停滞在 4 GHz 以下,这条定律开始失去意义。芯片公司对此做出了回应,制造了多核处理器,使多个进程可以并行运行,理论上可以加快运行速度。然而,许多实时交互式音频算法并没有从多线程(并行)处理中获益。
For decades, Moore’s law (1965) predicted that transistor density would double each year, with a followup effect on clock speed. By 2012, this law began to lose relevance as the clock speed of microprocessors stalled at less than 4 GHz. Chip companies responded by manufacturing multicore processors, so that multiple processes could run in parallel, theoretically speeding up operations. However, many real-time interactive audio algorithms do not benefit from multithreaded (parallel) processing.
声音合成算法中的每一步都需要一定的时间才能完成,对于复杂的算法,计算机不可能在一个采样周期的间隔内完成一个样本所需的计算。此外,音频线程正在与其他操作系统进程竞争。
Every step in a sound synthesis algorithm takes a certain amount of time to execute. For a complicated algorithm, a computer cannot always complete the calculations necessary for a sample in the interval of one sample period. Moreover, the audio thread is competing with other operating system processes.
为了更具体地理解这一点,请考虑通过表查找合成方法计算一个声音样本所需的以下步骤:
To understand this point more concretely, consider the following steps required to calculate one sample of sound by the table-lookup method of synthesis:
这里重点在于,每一步都需要时间。例如,假设计算机执行上述列表中的六项计算需要一微秒(百万分之一秒)。(为了解释简单起见,我们故意把时间设得很慢;现代计算机的速度比这更快。)如果我们使用 50 kHz 的采样率,则每个样本的可用时间为五万分之一秒,即 20 微秒。这意味着在理想情况下,计算机可以实时完成大约二十个简单振荡器所需的计算。
The important point here is that each step takes time. For example, let us say it takes a computer one microsecond (one millionth of a second) to perform the six calculations in the preceding list. (This is intentionally slow in order to make the explanation simple; modern computers are faster than this.) If we use a sampling rate of 50 kHz, the time available per sample is 1 / 50,000th of a second or 20 microseconds. This means that in an ideal world, the computer could complete the calculations necessary for about twenty simple oscillators in real time.
在现实世界中,通过在查表时使用插值;提高采样率;添加滤波器、延迟、调制振荡器、随机函数、更多通道、混响和空间处理;更新图形用户界面;以及留出与音乐家互动所需的时间,这个过程会变得更加复杂。这时,计算可能无法实时实现。这属于非实时合成的领域。
In the real world, the process is made more complicated by using interpolation in the table lookup; increasing the sampling rate; adding filters, delays, modulating oscillators, random functions, more channels, reverberation, and spatial processing; updating the graphical user interface; and allowing the time needed to interact with a musician. Here the calculations can become impossible to realize in real time. This is the domain of non-real-time synthesis.
非实时操作意味着从开始计算声音到最终听到声音之间存在相当大的延迟。在计算机音乐发展的初期,非实时(或离线)合成是唯一的选择。例如,JK Randall 的《小提琴与计算机抒情变奏曲》中一段两分钟的片段,于 1965 年至 1968 年间在普林斯顿大学 (Cardinal Records VCS 10057) 完成,耗时九个小时才完成。如果只需要进行微小的调整,就必须重复整个过程。尽管这是一个费力的过程,但一些专门的作曲家能够创作计算机合成的音乐作品(Tenney 1969;Von Foerster 和 Beauchamp 1969;Dodge 1985;Risset 1985a)。
Non-real-time operation means that there is a substantial delay between the time we start computing a sound and the time that we can listen to it. Non-real-time (or offline) synthesis was the only option in the primordial days of computer music. For example, a two-minute portion of J. K. Randall’s Lyric Variations for Violin and Computer, realized between 1965 and 1968 at Princeton University (Cardinal Records VCS 10057), took nine hours to compute. If a small adjustment was desired, the entire process would have to be repeated. Even though this was a laborious process, a handful of dedicated composers were able to create computer-synthesized works of music (Tenney 1969; Von Foerster and Beauchamp 1969; Dodge 1985; Risset 1985a).
举一个更现代的例子,第39章中描述的原子分解分析方法目前是一种非实时技术。然而,如果拥有强大的多核计算机,可以通过将其编程为多线程进程来显著提高该技术的速度。
For a more modern example, the method of atomic decomposition analysis, described in chapter 39, is currently a non-real-time technique. However, it is possible that someone with a powerful multicore computer could substantially increase the speed of this technique by programming it as a multithreaded process.
许多音频计算的速度比实时速度更快。例如,一段持续几分钟的复杂多轨混音,使用 Pro Tools、Logic 或 Ableton Live 等数字音频工作站,只需几秒钟就能渲染成声音文件。
Many audio computations operate faster than real time. For example, a complex multitrack mix of a piece lasting several minutes can be rendered to a sound file in seconds using a digital audio workstation like Pro Tools, Logic, or Ableton Live.
无论计算每个样本所需的时间是否超过一个采样周期,软件合成程序都可以生成一个声音文件作为输出。常见的声音文件格式包括波形音频文件格式 (WAVE) 和音频交换文件格式 (AIFF) 等。声音文件只是存储在数字存储介质上的音频数据文件。声音文件包含文件头文本和代表声音样本的数字。文件头包含文件的名称以及文件中样本的相关信息(采样率、每个样本的位数、通道数等)。
Whether or not it takes longer than one sample period to compute each sample, software synthesis programs can generate a sound file as their output. Common sound file formats include Waveform Audio File Format (WAVE) and Audio Interchange File Format (AIFF), among many others. A sound file is simply an audio data file stored on a digital storage medium. A sound file contains a header text and numbers representing sound samples. The header contains the name of the file and relevant information about the samples in the file (sampling rate, number of bits per sample, number of channels, etc.).
与其他计算机应用程序一样,许多不同的音频文件格式共存。在计算机音乐工作室中,格式转换是日常生活中经常遇到的情况。声音编辑器和实用程序(例如 SoundHack (Erbe 1992))可以执行这些转换。Lazzarini (2011c) 教授如何读写声音文件。
As in other computer applications, many different audio file formats coexist. The need to convert between formats is a practical fact of life in computer music studios. Sound editors and utilities like SoundHack (Erbe 1992) perform these conversions. Lazzarini (2011c) teaches how to read and write sound files.
20 世纪 70 年代出现了第一批基于定制硬件的实时数字合成器(Markowitz 1989;Alonso 1973;Alles and Di Giugno 1977;Snell 1977a、1977b;Samson 1980、1985;Asta 等 1980;Wallraff 1979a;Loy 1981、2013a、2013b;Alles 1977;Buxton 等 1978;Strawn 1985c;Roads and Strawn 1985;Roads 1989)。
The 1970s saw the introduction of the first real-time digital synthesizers based on custom hardware (Markowitz 1989; Alonso 1973; Alles and Di Giugno 1977; Snell 1977a, 1977b; Samson 1980, 1985; Asta, et al. 1980; Wallraff 1979a; Loy 1981, 2013a, 2013b; Alles 1977; Buxton et al. 1978; Strawn 1985c; Roads and Strawn 1985; Roads 1989).
到了 20 世纪 80 年代,数字合成器内部的大型电路板逐渐被能够实时实现多声部合成算法的微型芯片所取代。这些芯片可以批量生产,并嵌入雅马哈等制造商生产的廉价合成器中。到了 20 世纪 90 年代,类似的芯片被嵌入到计算机的声卡中。随着处理器速度的提升,一度一度认为软件合成将永远占据主导地位。然而,Eurorack 现象出现了。硬件合成(模拟和数字)与软件合成一起回归。
By the 1980s, large circuit boards inside digital synthesizers were being replaced by tiny chips that could realize multivoice synthesis algorithms in real time. These chips could be fabricated en masse and embedded in inexpensive synthesizers made by manufacturers such as Yamaha. In the 1990s, similar chips were embedded in sound cards inserted into computers. As processors became faster, for a while it looked as if software synthesis would rule forever. Then the Eurorack phenomenon happened, and hardware synthesis (both analog and digital) made a return alongside software synthesis.
图 8.10展示了一个实时计算机音乐合成系统的概览。该系统有三种生成数字声音的方式:
Figure 8.10 shows an overview of a real-time computer music synthesis system. This system has three ways of generating digital sound:
图 8.10 实时合成系统概览。
Figure 8.10 Overview of a real-time synthesis system.
实时操作的显著优势在于,音乐家可以在听到声音的同时演奏音乐输入设备或控制器。手势控制可以提升演奏的表现力。此外,还可以快速探索和测试合成方法的参数空间。音序器和乐谱编辑器使录制和编辑演奏成为可能。第40章和第41章介绍了演奏控制器和演奏软件,第51章和第52章介绍了MIDI和OSC通信协议。
The obvious advantage of real-time operation is that musical input devices or controllers can be played by the musician as sound is heard. Gestural control leads to more expressive performances. Moreover, the parameter space of a synthesis method can be explored and tested rapidly. Sequencers and score editors make it possible to record and edit performances. Chapters 40 and 41 present performance controllers and performance software, and chapters 51 and 52 cover the MIDI and OSC communication protocols.
本节简要介绍音频编程。许多书籍都深入细致地介绍了音频编程。例如,《音频编程手册》(Boulanger 和 Lazzarini 2011)包含超过 3,000 页的文字和数千行代码。另有专门的书籍介绍每种音频编程语言(SuperCollider、Csound、Max、Pd、ChucK、FAUST 等)。Neukom 的教材(2013)使用了 Csound、Max、Mathematica、C/C ++和 Processing。另一个资源是 YouTube 上的“音频程序员”视频频道。
This section provides a brief orientation to audio programming. Many tomes address audio programming in depth and detail. For example, The Audio Programming Book (Boulanger and Lazzarini 2011) contains over 3,000 pages of text and thousands of lines of code. Specific books describe each audio programming language (SuperCollider, Csound, Max, Pd, ChucK, FAUST, etc.). Neukom’s text (2013) uses Csound, Max, Mathematica, C/C++, and Processing. Another resource is The Audio Programmer video channel on YouTube.
音频编程的一个关键要素是音频 I/O 的管理。像 SuperCollider 和 Max 这样的音乐专用环境可以自动处理这个问题。相比之下,在使用 C 或 C ++等语言编写代码时,程序员需要更多地负责确保音频样本的流畅传输,避免卡顿。幸运的是,libsndfile、Portaudio 和 JUCE 等音频库可以处理实时输入和输出的许多底层细节。在音频软件行业,ROLI 的 JUCE 受到许多专业开发人员的青睐(Jones 2020)。
A crucial element of audio programming is the management of audio I/O. Music-specific environments like SuperCollider and Max handle this automatically. By contrast, when coding in a language like C or C++, the programmer has more responsibility for ensuring the smooth stream of audio samples without stuttering. Fortunately, audio libraries such as libsndfile, Portaudio, and JUCE handle many of the low-level details of real-time input and output. In the audio software industry, ROLI’s JUCE is favored by many professional developers (Jones 2020).
实时合成每次生成一个样本块。典型的样本块大小范围从 32 个样本到 2,048 个样本。较大的缓冲区可以最大程度地防止故障,但会引入延迟:即从录制或合成声音到听到声音的延迟。实现低延迟音频的机制是音频回调循环,只要声音正在录制或合成,它就会重复执行。音频回调循环从合成器获取一个样本块,并将其放入内存缓冲区。通过将缓冲区的内容发送到实时输出,样本块即可播放。然而,合成或对样本块应用效果可能需要超过一个采样周期。此外,软件例程的处理速度并非恒定,因为它与可能随时启动的操作系统进程并发运行。因此,使用单输出缓冲区的方案容易出现音频丢失。因此,通常在音频回调循环中使用双缓冲或四缓冲方案。可以将一个包含 2,048 个样本的缓冲区分成两个或四个缓冲部分。一个缓冲区播放时,另一个缓冲区正在由合成算法填充新的样本。在某些情况下,缓冲可以由声音库自动管理。Maldonado(2011)将引导读者了解该过程的各个步骤。 AudioMulch 的开发者 Ross Bencina (2011) 警告了音频回调循环中存在的问题,并提供了避免音频故障的注意事项列表。图 8.11展示了一个用伪代码编写的音频回调循环示例。
Real-time synthesis generates a block of samples at a time. Typical block sizes range from 32 samples to 2,048 samples. Large buffers provide the most protection against glitches, but they introduce latency: a delay from the time that the sound is recorded or synthesized to the time that it is heard. The mechanism for implementing low-latency audio is the audio callback loop, which repeats as long as sound is being recorded or synthesized. The audio callback loop takes a block of samples from a synthesizer and puts them into a memory buffer. The block of samples plays by sending the contents of the buffer to the real-time output. However, synthesizing or applying effects to a block of samples may take longer than one sample period. Moreover, the processing speed of a software routine is not constant because it runs concurrently with operating system processes that can start at any time. Hence a scheme using a single output buffer is prone to audio dropouts. Thus it is common practice to use a double- or quad-buffering scheme in the audio callback loop. One might break a 2,048-sample buffer into two or four buffer sections. One buffer plays while the other buffer is being filled with new samples by the synthesis algorithm. In some cases buffering can be automatically managed by the sound library. Maldonado (2011) walks readers through the steps of the process. The developer of AudioMulch, Ross Bencina (2011) warns of problems in the audio callback loop and provides a list of do’s and don’ts to avoid audio glitches. Figure 8.11 shows an example of an audio callback loop written in pseudocode.
图 8.11 音频回调循环读取波表并将音频输出到 DAC。感谢 Rodney Duplessis 的贡献。
Figure 8.11 Audio callback loop that reads a wavetable and outputs audio to the DAC. Credit to Rodney Duplessis.
Lazzarini (2011d) 展示了合成器引擎基本构建模块的C ++代码,包括插值振荡器、包络发生器、滤波器、延迟线、镶边器、卷积和音高转换器。该作者还编写了两卷本的《计算机乐器 I》和《计算机乐器II》,其中包含 Python、Csound 和 Faust 的详细示例(Lazzarini 2017、2019)。
Lazzarini (2011d) presents the C++ code for the basic building blocks of a synthesizer engine: interpolating oscillators, envelope generators, filters, delay lines, flangers, convolution, and pitch shifters. The same author has written a two-volume set, Computer Music Instruments I and II, with detailed examples in Python, Csound, and Faust (Lazzarini 2017, 2019)
软件合成器的传播得益于插件标准。Steinberg 于 1996 年推出的虚拟工作室技术 (VST) 格式至今仍是广泛支持的协议。其他格式包括 Apple 的 Audio Units (AU) 和 Linux 的 LV2 等。有关开发音频插件的教程,请参阅 Goudard 和 Muller (2003)、Dobson (2011) 和 Pirkle (2019)。
The spread of software synthesizers has been fostered by plug-in standards. Steinberg’s Virtual Studio Technology (VST) format, introduced in 1996, continues as a broadly supported protocol. Other formats include Apple’s Audio Units (AU) and LV2 for Linux, among others. For tutorials on developing audio plugins, refer to Goudard and Muller (2003), Dobson (2011), and Pirkle (2019).